Assignment Chef icon Assignment Chef

[SOLVED] Csce689 homework 2- reproducing chatgpt

5.0 1 customer review Digital download

Digital download

$25.00

Availability
In stock
Checkout
One item

Need a hand?

Message us on WhatsApp for payment or download support.

WhatsApp QR code
1. Create an account on HPRC (https://hprc.tamu.edu/apply/)
a. Apply for Basic Allocation on Grace (20,000 Service Units)
b. 20,000 Service Units (SUs) ~= 250 A100 (40G) GPU hours
2. Train a variant of GPT-2
a. Limit training time/resource to max 24 hours w/ one A100 40G GPU
b. Follow instructions at https://github.com/parasol-aser/hw-reproduce-chatgpt
Your Goal: train the best GPT model from scratch within the resource budget – Top 10 submissions with the highest HellaSwag accuracy will each earn 1 bonus point – Top 3 will earn 4, 2, 1 additional bonus points respectively
Your Strategies:
– Tune hyper-parameters guided by the scaling laws – Try different architectures, e.g.:
– Group Query Attention
– Replace LayerNorm by RMSNorm
– Replace absolute positional encoding by RoPE
– Replace GeLU activation function by SwiGLU
– Drop Positional Encoding
– Change KQV (e.g., merge K and Q)
– Elimination or Modification of FFN Layers
– Mixture of Experts (MoE)
– …
Submission (5pt):
– Your final model checkpoint and original logs stored on Grace (2pt)
– Need to share a folder with our grader
– Your training code (only diff is required if based on karpathy/llm.c) (1pt)
– A report that describes your solution and results (including remaining challenges and failures if any) (2pt)
– Limit your report to three pages with 10pt font size