WEEK 5: Transformer Architecture

This week we assemble all the pieces you've learned (attention, embeddings, sequence models) into the transformer architecture that powers every major LLM. You'll see how encoders and decoders work together, understand the complete data flow from text to predictions, and practice drawing the architecture yourself. This is also exam prep week. You'll finish Portfolio Piece 1, complete your reflection, and prepare for Exam 1 on Monday.

Note: Monday Feb 16 is Presidents Day (no class). We meet Tuesday and Wednesday instead, and there is no discussion.

This week's checklist

Attend Lecture 7 (Tue, Feb 17): Transformer Architecture
Attend Lecture 8 (WEd, Feb 18): Decoding and Review
Complete Portfolio Piece 1 and Reflection 5, pushed to GitHub (due Friday, Feb 20 by 11:59pm)
Study for Exam 1 (Monday, Feb 23) - covers everything through transformers and decoding

No discussion section this week (Presidents Day week)

This week's learning objectives

After Lecture 7 (Tue 2/17) students will be able to...

Trace complete data flow: text → tokens → embeddings → Q/K/V → attention → predictions
Explain all transformer building blocks: positional encoding, residual connections, layer norm, FFN
Draw encoder-decoder architecture from memory
Distinguish encoder blocks (2 sublayers, runs once) from decoder blocks (3 sublayers, runs multiple times)
Explain autoregressive generation and what feeds back at each step
Distinguish training (teacher forcing, parallel) from inference (sequential generation)

After Lecture 8 (Wed 2/18) students will be able to...

Explain greedy decoding and why it can produce repetitive or suboptimal outputs
Describe beam search: how it works, beam width, when to use it
Understand sampling strategies: temperature, top-k, top-p (nucleus sampling)
Articulate tradeoffs: deterministic vs creative, quality vs diversity
Connect decoding choices to real LLM behavior (why ChatGPT responses vary)
Recognize common decoding problems: repetition, hallucination, mode collapse
Feel prepared for the exam on Monday!

Portfolio Piece 1: Polish a Past Lab

Due: Friday, Feb 20 by 11:59pm

Task: Take one of your past labs (Labs 1-3) and polish it into a portfolio-quality project.

What "polish" means:

Clean, well-documented code
Thoughtful analysis and insights
Clear visualizations

Where to find details:

GitHub Classroom repo README has full instructions
Rubric is in the assignment repo
You have flexibility in how you extend and improve your chosen lab

Peer Review Process: After submission, you'll be assigned 2 peers' projects to review (assigned Monday 2/23, due Wednesday 2/25). Provide specific feedback: what worked well, a substantive question, something you learned.

Week 5 Reflection Prompts

Write 300-500 words reflecting on this week's content. Pick one or two prompts that resonate, or go in your own direction:

Now that you've seen the full transformer architecture, what surprised you most? What design choices seem clever or confusing?
Temperature, top-k, top-p... Which would you choose when and why? When is creativity good vs problematic in LLM outputs?
As you prepare for the exam, what concepts from the past 5 weeks feel most central? What connections are you seeing?
As you polish your portfolio piece, what stands out about your learning journey?

Remember to write in your own voice, without AI assistance. These reflections are graded on completion only and help me understand what's working for you.

Exam 1 Preparation

Exam 1: Monday, Feb 23 during class (12:20-1:35pm)

Coverage: Lectures 1-8 (tokenization, embeddings, attention, transformers, decoding)

Format: Short answer, conceptual questions, worked problems. Trace data flows, draw architectures, explain mechanisms.

Study tips:

Practice drawing transformer architecture from memory
Trace examples: text → tokens → embeddings → predictions
Understand WHY (not just WHAT)
Review notesheets and lecture notes online

Key topics: Tokenization (BPE), word embeddings, attention (Q/K/V, multi-head), transformers (encoder vs decoder), decoding (greedy, beam search, sampling)

Resources for further learning

Core readings

The Illustrated Transformer by Jay Alammar - Read this again now that you've learned the pieces!
Attention is All You Need (Vaswani et al., 2017) - The original transformer paper

Visualizations and demos

Transformer Explainer - Interactive visualization
BertViz - Visualize attention in transformers
TensorFlow Embedding Projector - Explore word vectors

Videos

Attention in transformers, visually explained by 3Blue1Brown - Chapter 6
The Illustrated Transformer by Jay Alammar (video walkthrough)

Deep dives into sampling and beam search

For deeper understanding

Formal Algorithms for Transformers - Mathematical treatment
The Annotated Transformer - Code walkthrough with explanations
Sampling and Beam Search - Lectures from Graham Neubig

Lauren's CDS593 Materials