WEEK 10: Retrieval-Augmented Generation (RAG)

RAG is one of the most immediately useful techniques for building real LLM applications. It solves a fundamental problem: LLMs have knowledge cutoffs, they hallucinate on specific facts, and they can't access private or proprietary information. Retrieval can fix all of that. Monday introduces the core architecture: embed documents, store in a vector database, retrieve relevant chunks, inject into the prompt. Wednesday goes deeper: advanced retrieval strategies, evaluation, and what makes production RAG systems actually work.

This week's checklist

  • Attend Lecture 15 (Mon, Mar 30): RAG Part 1 - Architecture and Foundations
  • Attend discussion section (Tue, Apr 1): Tools for RAG implementation and evaluation
  • Attend Lecture 16 (Wed, Apr 1): RAG Part 2 - Advanced Techniques and Evaluation
  • Submit Week 10 Lab (due Sun, Apr 5 by 11:59pm)

This week's learning objectives

After Lecture 15 (Mon Mar 30) students will be able to...

  • Explain the three core RAG problems it solves: knowledge cutoffs, hallucination on specifics, private data
  • Describe the RAG pipeline: chunk, embed, store, retrieve, augment, generate
  • Choose an appropriate chunking strategy for a given document type
  • Explain why semantic (vector) search outperforms keyword matching for many queries, and when it doesn't
  • Describe how vector databases use ANN algorithms to scale similarity search
  • Distinguish bi-encoders (retrieval) from cross-encoders (re-ranking) and explain the two-stage pattern
  • Describe hybrid search: combining BM25 keyword search with semantic search using Reciprocal Rank Fusion
  • Know when RAG is the right approach vs. fine-tuning, or using both

After Lecture 16 (Wed Apr 1) students will be able to...

  • Write effective RAG prompts: grounding instructions, fallback behavior, citation requirements
  • Explain and apply contextual retrieval, HyDE, and multi-query retrieval, and know when each helps
  • Explain how HNSW, IVF, and Product Quantization differ as ANN approaches
  • Describe query routing and why some questions shouldn't go to a vector database at all
  • Identify the three main RAG attack surfaces: prompt injection, data access/privacy, and database curation
  • Apply defenses: metadata filtering, PII redaction, document governance
  • Evaluate a RAG system: retrieval metrics (Precision@k, Recall@k, MRR) vs. generation metrics (faithfulness, relevance)
  • Diagnose common RAG failures: is it a retrieval problem or a generation problem?

Discussion Section (Tue Apr 1): Tools for RAG

Hands-on practice with the tools you'll use to build and evaluate RAG systems.

Week 10 Lab: RAG Exploration

Due: Sunday, April 5 by 11:59pm

Weight: Counts as part of the completion-based tasks category. Graded for completion.

This lab is intentionally open-ended. Use it to explore RAG in a direction that connects to your project idea. Build something small, see what breaks, and come away with a sense of what a RAG-based project would actually involve.

What to do:

  • Build a minimal RAG pipeline: chunk some documents, embed them, store in a vector DB, and retrieve against a few queries
  • Experiment with at least one design choice: chunk size, number of retrieved chunks, embedding model, or advanced technique (contextual retrieval, HyDE, hybrid search)
  • Document what you tried and what you noticed: when does retrieval work well? When does it fail?
  • Reflect on connections to your project: could RAG fit into what you're building? What would you need?

Deliverable: Push your notebook to GitHub (fully merged) and submit your repo link on Gradescope.

Resources for further learning

RAG foundations

Advanced RAG

Tools