WEEK 7: Training at Scale and Post-Training

Welcome back from Exam 1! This week we shift from architecture to training - how do you actually take a transformer and turn it into a powerful LLM? Monday covers the massive engineering and data effort behind pre-training at scale: data pipelines, distributed compute, and the scaling laws that guide design decisions. Wednesday pivots to post-training: how raw pre-trained models become useful assistants like ChatGPT through instruction tuning, RLHF, and DPO.

Spring break follows this week (March 9-13).

This week's checklist

Attend Lecture 9 (Mon, Mar 2): Training LLMs at scale
Attend discussion section (Tue, Mar 3): Transformers in Python + project brainstorming
Attend Lecture 10 (Wed, Mar 4): Post-training and RLHF
Portfolio Piece 1 peer reviews due Wednesday, Mar 4 by 11:59pm (Gradescope)
Week 7 Reflection due Friday, Mar 6 by 11:59pm (GitHub)
Course survey due Friday, Mar 6 by 11:59pm (Gradescope, anonymous)
Mid-course participation self-assessment due Friday, Mar 6 by 11:59pm (Gradescope)

This week's learning objectives

After Lecture 9 (Mon Mar 2) students will be able to...

Articulate the qualitative differences between lab-scale transformers and production LLMs
Explain pre-training objectives: next-token prediction (GPT) vs masked language modeling (BERT)
Describe typical data sources for pre-training (Common Crawl, books, Wikipedia, code) and why data quality matters
Recognize the scale of pre-training: trillions of tokens, weeks to months, thousands of GPUs
Explain key distributed training strategies: data parallelism, model parallelism, pipeline parallelism
Describe Chinchilla scaling laws and how they changed how models are trained
Explain what "emergent abilities" means and the debate around them

After Lecture 10 (Wed Mar 4) students will be able to...

Explain why post-training is necessary: base models predict tokens, they don't follow instructions
Describe the three-stage post-training pipeline: SFT, reward model training, RLHF
Explain how human preference rankings are collected and used to train reward models
Describe DPO (Direct Preference Optimization) and why it simplifies RLHF
Explain Constitutional AI: how models critique their own outputs using explicit principles
Compare RLHF, DPO, and Constitutional AI trade-offs
Describe common benchmarks (MMLU, TruthfulQA) and their limitations (Goodhart's Law, saturation)
Explain why automated benchmarks are insufficient and describe alternatives (human evaluation, Chatbot Arena)

Discussion Section (Tue Mar 3): Transformers in Python + Project Brainstorming

This section has two parts.

Part 1: Implementing attention and transformers in Python (rescheduled from last week)

Implement scaled dot-product attention from scratch in NumPy
Trace data through a transformer block step by step
Connect the math from Lectures 6-7 to working code

Part 2: Project brainstorming

Start thinking about what you'd like to build for the final project
Discuss ideas with classmates - what problems interest you? What would you actually use?
You'll have more time to formalize proposals later in the semester

Week 7 Reflection Prompts

Write 300-500 words. Some prompts to consider (you don't need to answer all of them):

What surprised you most about the scale of pre-training? The data volume? The compute cost? Who can afford to do it?
Scaling laws say performance improves predictably with compute. Emergent abilities suggest surprises can still happen. Do you find these ideas in tension? Does it matter, for AI risk, whether capabilities emerge suddenly or gradually?
After learning about RLHF and post-training, how do you think about the models you use (ChatGPT, Claude) differently?
What's the hardest part of aligning LLMs with human values? Whose values should be encoded? How do you handle disagreement across cultures or communities?
What questions are you taking into spring break? What are you most curious about for the second half of the course?

Write in your own voice, without AI assistance. Graded on completion only.

Portfolio Piece 1 Peer Reviews

Due: Wednesday, March 4 by 11:59pm on Gradescope

Weight: 20% of your portfolio piece grade (1% of overall course grade)

Review 2 peers' Portfolio Piece 1 submissions. For each, provide:

What worked well (2-3 specific observations)
A substantive question showing you engaged with their work
Something you learned from reading their project

Be specific - reference their actual code, choices, or analysis. "This was interesting" is not useful feedback. See the Participation and Assessment rubrics on the course site for guidance on what makes good peer feedback.

Mid-Course Participation Self-Assessment

Due: Friday, March 6 by 11:59pm on Gradescope

Write 1-2 paragraphs making a case for your participation score (out of 5) for the first half of the semester. Include at least 2-3 specific examples of ways you engaged - lecture questions, office hours visits, Piazza posts, helping classmates, etc. The teaching team will confirm or follow up if we see it differently.

See the Participation rubric for full details and example self-assessments.

GPT-3 Paper (Brown et al., 2020) - Language Models are Few-Shot Learners
Chinchilla Paper (Hoffmann et al., 2022) - Training Compute-Optimal Large Language Models
Scaling Laws for Neural Language Models (Kaplan et al., 2020)
The Pile - EleutherAI's open-source training dataset

Post-training and alignment

InstructGPT paper (Ouyang et al., 2022) - The original RLHF paper for GPT-3
DPO paper (Rafailov et al., 2023) - Direct Preference Optimization
Constitutional AI paper (Bai et al., 2022) - Anthropic's approach
Hugging Face RLHF blog - Illustrating Reinforcement Learning from Human Feedback

Tools

HuggingFace Model Hub - browse pre-trained and instruction-tuned models
HuggingFace TRL library - for working with RLHF and DPO
Google Colab - free GPU for running small models

Lauren's CDS593 Materials