1. Create an account on HPRC (https://hprc.tamu.edu/apply/)
a. Apply for Basic Allocation on Grace (20,000 Service Units)
b. 20,000 Service Units (SUs) ~= 250 A100 (40G) GPU hours
2. Train a variant of GPT-2
a. Limit training time/resource to max 24 hours w/ one A100 40G GPU
b. Follow instructions at https://github.com/parasol-aser/hw-reproduce-chatgpt
Your Goal: train the best GPT model from scratch within the resource budget – Top 10 submissions with the highest HellaSwag accuracy will each earn 1 bonus point – Top 3 will earn 4, 2, 1 additional bonus points respectively
Your Strategies:
– Tune hyper-parameters guided by the scaling laws – Try different architectures, e.g.:
– Group Query Attention
– Replace LayerNorm by RMSNorm
– Replace absolute positional encoding by RoPE
– Replace GeLU activation function by SwiGLU
– Drop Positional Encoding
– Change KQV (e.g., merge K and Q)
– Elimination or Modification of FFN Layers
– Mixture of Experts (MoE)
– …
Submission (5pt):
– Your final model checkpoint and original logs stored on Grace (2pt)
– Need to share a folder with our grader
– Your training code (only diff is required if based on karpathy/llm.c) (1pt)
– A report that describes your solution and results (including remaining challenges and failures if any) (2pt)
– Limit your report to three pages with 10pt font size
ChatGPT, CSCE689, Homework, Reproducing, solved
[SOLVED] Csce689 homework 2- reproducing chatgpt
$25
File Name: Csce689_homework_2__reproducing_chatgpt.zip
File Size: 367.38 KB
Reviews
There are no reviews yet.