WEEK 5: Transformer Architecture

This week we assemble all the pieces you've learned (attention, embeddings, sequence models) into the transformer architecture that powers every major LLM. You'll see how encoders and decoders work together, understand the complete data flow from text to predictions, and practice drawing the architecture yourself. This is also exam prep week. You'll finish Portfolio Piece 1, complete your reflection, and prepare for Exam 1 on Monday.

Note: Monday Feb 16 is Presidents Day (no class). We meet Tuesday and Wednesday instead, and there is no discussion.

This week's checklist

  • Attend Lecture 7 (Tue, Feb 17): Transformer Architecture
  • Attend Lecture 8 (WEd, Feb 18): Decoding and Review
  • Complete Portfolio Piece 1 and Reflection 5, pushed to GitHub (due Friday, Feb 20 by 11:59pm)
  • Study for Exam 1 (Monday, Feb 23) - covers everything through transformers and decoding

No discussion section this week (Presidents Day week)

This week's learning objectives

After Lecture 7 (Tue 2/17) students will be able to...

  • Trace complete data flow: text → tokens → embeddings → Q/K/V → attention → predictions
  • Explain all transformer building blocks: positional encoding, residual connections, layer norm, FFN
  • Draw encoder-decoder architecture from memory
  • Distinguish encoder blocks (2 sublayers, runs once) from decoder blocks (3 sublayers, runs multiple times)
  • Explain autoregressive generation and what feeds back at each step
  • Distinguish training (teacher forcing, parallel) from inference (sequential generation)

After Lecture 8 (Wed 2/18) students will be able to...

  • Explain greedy decoding and why it can produce repetitive or suboptimal outputs
  • Describe beam search: how it works, beam width, when to use it
  • Understand sampling strategies: temperature, top-k, top-p (nucleus sampling)
  • Articulate tradeoffs: deterministic vs creative, quality vs diversity
  • Connect decoding choices to real LLM behavior (why ChatGPT responses vary)
  • Recognize common decoding problems: repetition, hallucination, mode collapse
  • Feel prepared for the exam on Monday!

Portfolio Piece 1: Polish a Past Lab

Due: Friday, Feb 20 by 11:59pm

Task: Take one of your past labs (Labs 1-3) and polish it into a portfolio-quality project.

What "polish" means:

  • Clean, well-documented code
  • Thoughtful analysis and insights
  • Clear visualizations

Where to find details:

  • GitHub Classroom repo README has full instructions
  • Rubric is in the assignment repo
  • You have flexibility in how you extend and improve your chosen lab

Peer Review Process: After submission, you'll be assigned 2 peers' projects to review (assigned Monday 2/23, due Wednesday 2/25). Provide specific feedback: what worked well, a substantive question, something you learned.

Week 5 Reflection Prompts

Write 300-500 words reflecting on this week's content. Pick one or two prompts that resonate, or go in your own direction:

  • Now that you've seen the full transformer architecture, what surprised you most? What design choices seem clever or confusing?
  • Temperature, top-k, top-p... Which would you choose when and why? When is creativity good vs problematic in LLM outputs?
  • As you prepare for the exam, what concepts from the past 5 weeks feel most central? What connections are you seeing?
  • As you polish your portfolio piece, what stands out about your learning journey?

Remember to write in your own voice, without AI assistance. These reflections are graded on completion only and help me understand what's working for you.

Exam 1 Preparation

Exam 1: Monday, Feb 23 during class (12:20-1:35pm)

Coverage: Lectures 1-8 (tokenization, embeddings, attention, transformers, decoding)

Format: Short answer, conceptual questions, worked problems. Trace data flows, draw architectures, explain mechanisms.

Study tips:

  • Practice drawing transformer architecture from memory
  • Trace examples: text → tokens → embeddings → predictions
  • Understand WHY (not just WHAT)
  • Review notesheets and lecture notes online

Key topics: Tokenization (BPE), word embeddings, attention (Q/K/V, multi-head), transformers (encoder vs decoder), decoding (greedy, beam search, sampling)

Resources for further learning

Core readings

Visualizations and demos

Videos

For deeper understanding