WEEK 3: Deep Learning Fundamentals & Tokenization

This week we dive into the foundations of modern NLP. On Monday we'll explore how neural networks learn through backpropagation and gradient descent. Tuesday's discussion gets you hands-on with PyTorch. Then Wednesday we'll see how text gets split into tokens - a critical step that affects everything downstream.

This week's checklist (due Friday 2/6)

  • Attend Lecture 3 (Mon, Feb 3): Deep learning fundamentals
  • Attend Discussion Section (Tue, Feb 4): PyTorch hands-on
  • Attend Lecture 4 (Wed, Feb 5): Tokenization
  • Complete Week 3 Reflection and Lab, pushed to GitHub

This week's learning objectives

After Lecture 3 (Mon 2/3) students will be able to...

Neural Networks:

  • Explain how neural networks transform inputs through layers of weighted sums and activations
  • Understand backpropagation as efficient application of the chain rule
  • Describe gradient descent and how it minimizes loss functions
  • Recognize why depth matters: hierarchical feature learning
  • Identify the sequence modeling challenge for feed-forward networks
  • Discuss the computational and environmental costs of training large models

After Lecture 4 (Wed 2/5) students will be able to...

Tokenization:

  • Explain why tokenization choices affect model behavior (e.g., why LLMs struggle to count letters)
  • Describe historical approaches: word-level, stemming, lemmatization
  • Explain how subword tokenization (BPE, WordPiece) handles vocabulary challenges
  • Understand the role of special tokens in chat models (system, user, assistant)
  • Use tokenizer tools to see how models "see" text
  • Discuss fairness implications of tokenization across languages
  • Preview: understand that tokens become embeddings (vectors that capture meaning)

Discussion Section (Tue 2/4): PyTorch Hands-On

Please bring your laptop to discussion! You will be coding during the class.

What you'll do:

  1. Get PyTorch installed and running (if not already)
  2. Build a simple neural network from scratch
  3. Train it on a toy task (e.g. simple classification)
  4. Experiment with different architectures: more layers, different activations
  5. See backpropagation in action with loss.backward()

Week 3 Reflection Prompts

Write 300-500 words reflecting on this week's content, or the area in general. Some prompts to consider:

  • If you are new to deep learning, what clicked for you, and what questions do you still have? If you've studied the subject before, did you learn something or gain a new perspective?
  • After learning about tokenization, were you surprised by how LLMs "see" text? What implications does this have?
  • What do you think about the tokenization fairness discussion? Should companies address language efficiency differences? How?
  • What connections do you see between tokenization choices and model capabilities?
  • We discussed the environmental and financial costs of training large models. Who should bear these costs? Should there be regulations?

Remember to write in your own voice, without AI assistance. These reflections are graded on completion only and help me understand what's working for you.

Lab 2: Neural Networks and/or Tokenization Exploration

Due: Friday, Feb 6 by 11:59pm

Choose your focus (or do both, or something else!):

Option A: Neural Network Exploration

  • Build a simple neural network in PyTorch from scratch
  • Train it on a task (XOR, MNIST digits, simple classification)
  • Experiment: What happens as you add layers? Change activation functions? Adjust learning rate?
  • Visualize the loss curve - can you see gradient descent working?

Option B: Tokenization Exploration

  • Experiment with different tokenizers (OpenAI, Claude, tiktoken)
  • Compare token counts: code vs prose, English vs other languages, emojis
  • Investigate the "strawberry" problem - why can't LLMs count letters?
  • Explore fairness: same content in different languages, how do token counts differ?

Option C: Connect the Two

  • Tokenize some text, convert to simple numerical representations
  • Feed through a neural network for a simple task
  • See the full pipeline: text, tokens, numbers, neural network, predictions

Resources for further learning

On neural networks

On tokenization

Tutorials