WEEK 7: Training at Scale and Post-Training
Welcome back from Exam 1! This week we shift from architecture to training - how do you actually take a transformer and turn it into a powerful LLM? Monday covers the massive engineering and data effort behind pre-training at scale: data pipelines, distributed compute, and the scaling laws that guide design decisions. Wednesday pivots to post-training: how raw pre-trained models become useful assistants like ChatGPT through instruction tuning, RLHF, and DPO.
Spring break follows this week (March 9-13).
This week's checklist
- Attend Lecture 9 (Mon, Mar 2): Training LLMs at scale
- Attend discussion section (Tue, Mar 3): Transformers in Python + project brainstorming
- Attend Lecture 10 (Wed, Mar 4): Post-training and RLHF
- Portfolio Piece 1 peer reviews due Wednesday, Mar 4 by 11:59pm (Gradescope)
- Week 7 Reflection due Friday, Mar 6 by 11:59pm (GitHub)
- Course survey due Friday, Mar 6 by 11:59pm (Gradescope, anonymous)
- Mid-course participation self-assessment due Friday, Mar 6 by 11:59pm (Gradescope)
This week's learning objectives
After Lecture 9 (Mon Mar 2) students will be able to...
- Articulate the qualitative differences between lab-scale transformers and production LLMs
- Explain pre-training objectives: next-token prediction (GPT) vs masked language modeling (BERT)
- Describe typical data sources for pre-training (Common Crawl, books, Wikipedia, code) and why data quality matters
- Recognize the scale of pre-training: trillions of tokens, weeks to months, thousands of GPUs
- Explain key distributed training strategies: data parallelism, model parallelism, pipeline parallelism
- Describe Chinchilla scaling laws and how they changed how models are trained
- Explain what "emergent abilities" means and the debate around them
After Lecture 10 (Wed Mar 4) students will be able to...
- Explain why post-training is necessary: base models predict tokens, they don't follow instructions
- Describe the three-stage post-training pipeline: SFT, reward model training, RLHF
- Explain how human preference rankings are collected and used to train reward models
- Describe DPO (Direct Preference Optimization) and why it simplifies RLHF
- Explain Constitutional AI: how models critique their own outputs using explicit principles
- Compare RLHF, DPO, and Constitutional AI trade-offs
- Describe common benchmarks (MMLU, TruthfulQA) and their limitations (Goodhart's Law, saturation)
- Explain why automated benchmarks are insufficient and describe alternatives (human evaluation, Chatbot Arena)
Discussion Section (Tue Mar 3): Transformers in Python + Project Brainstorming
This section has two parts.
Part 1: Implementing attention and transformers in Python (rescheduled from last week)
- Implement scaled dot-product attention from scratch in NumPy
- Trace data through a transformer block step by step
- Connect the math from Lectures 6-7 to working code
Part 2: Project brainstorming
- Start thinking about what you'd like to build for the final project
- Discuss ideas with classmates - what problems interest you? What would you actually use?
- You'll have more time to formalize proposals later in the semester
Week 7 Reflection Prompts
Write 300-500 words. Some prompts to consider (you don't need to answer all of them):
- What surprised you most about the scale of pre-training? The data volume? The compute cost? Who can afford to do it?
- Scaling laws say performance improves predictably with compute. Emergent abilities suggest surprises can still happen. Do you find these ideas in tension? Does it matter, for AI risk, whether capabilities emerge suddenly or gradually?
- After learning about RLHF and post-training, how do you think about the models you use (ChatGPT, Claude) differently?
- What's the hardest part of aligning LLMs with human values? Whose values should be encoded? How do you handle disagreement across cultures or communities?
- What questions are you taking into spring break? What are you most curious about for the second half of the course?
Write in your own voice, without AI assistance. Graded on completion only.
Portfolio Piece 1 Peer Reviews
Due: Wednesday, March 4 by 11:59pm on Gradescope
Weight: 20% of your portfolio piece grade (1% of overall course grade)
Review 2 peers' Portfolio Piece 1 submissions. For each, provide:
- What worked well (2-3 specific observations)
- A substantive question showing you engaged with their work
- Something you learned from reading their project
Be specific - reference their actual code, choices, or analysis. "This was interesting" is not useful feedback. See the Participation and Assessment rubrics on the course site for guidance on what makes good peer feedback.
Mid-Course Participation Self-Assessment
Due: Friday, March 6 by 11:59pm on Gradescope
Write 1-2 paragraphs making a case for your participation score (out of 5) for the first half of the semester. Include at least 2-3 specific examples of ways you engaged - lecture questions, office hours visits, Piazza posts, helping classmates, etc. The teaching team will confirm or follow up if we see it differently.
See the Participation rubric for full details and example self-assessments.
Course Survey
Due: Friday, March 6 by 11:59pm on Gradescope
An anonymous survey to share feedback on the course so far. Takes about 15-25 minutes. Your honest input shapes how the course runs for the rest of the semester.
Resources for further learning
Pre-training and scaling
- GPT-3 Paper (Brown et al., 2020) - Language Models are Few-Shot Learners
- Chinchilla Paper (Hoffmann et al., 2022) - Training Compute-Optimal Large Language Models
- Scaling Laws for Neural Language Models (Kaplan et al., 2020)
- The Pile - EleutherAI's open-source training dataset
Post-training and alignment
- InstructGPT paper (Ouyang et al., 2022) - The original RLHF paper for GPT-3
- DPO paper (Rafailov et al., 2023) - Direct Preference Optimization
- Constitutional AI paper (Bai et al., 2022) - Anthropic's approach
- Hugging Face RLHF blog - Illustrating Reinforcement Learning from Human Feedback
Tools
- HuggingFace Model Hub - browse pre-trained and instruction-tuned models
- HuggingFace TRL library - for working with RLHF and DPO
- Google Colab - free GPU for running small models