WEEK 10: Retrieval-Augmented Generation (RAG)

RAG is one of the most immediately useful techniques for building real LLM applications. It solves a fundamental problem: LLMs have knowledge cutoffs, they hallucinate on specific facts, and they can't access private or proprietary information. Retrieval can fix all of that. Monday introduces the core architecture: embed documents, store in a vector database, retrieve relevant chunks, inject into the prompt. Wednesday goes deeper: advanced retrieval strategies, evaluation, and what makes production RAG systems actually work.

This week's checklist

Attend Lecture 15 (Mon, Mar 30): RAG Part 1 - Architecture and Foundations
Attend discussion section (Tue, Apr 1): Tools for RAG implementation and evaluation
Attend Lecture 16 (Wed, Apr 1): RAG Part 2 - Advanced Techniques and Evaluation
Submit Week 10 Lab (due Sun, Apr 5 by 11:59pm)

This week's learning objectives

After Lecture 15 (Mon Mar 30) students will be able to...

Explain the three core RAG problems it solves: knowledge cutoffs, hallucination on specifics, private data
Describe the RAG pipeline: chunk, embed, store, retrieve, augment, generate
Choose an appropriate chunking strategy for a given document type
Explain why semantic (vector) search outperforms keyword matching for many queries, and when it doesn't
Describe how vector databases use ANN algorithms to scale similarity search
Distinguish bi-encoders (retrieval) from cross-encoders (re-ranking) and explain the two-stage pattern
Describe hybrid search: combining BM25 keyword search with semantic search using Reciprocal Rank Fusion
Know when RAG is the right approach vs. fine-tuning, or using both

After Lecture 16 (Wed Apr 1) students will be able to...

Write effective RAG prompts: grounding instructions, fallback behavior, citation requirements
Explain and apply contextual retrieval, HyDE, and multi-query retrieval, and know when each helps
Explain how HNSW, IVF, and Product Quantization differ as ANN approaches
Describe query routing and why some questions shouldn't go to a vector database at all
Identify the three main RAG attack surfaces: prompt injection, data access/privacy, and database curation
Apply defenses: metadata filtering, PII redaction, document governance
Evaluate a RAG system: retrieval metrics (Precision@k, Recall@k, MRR) vs. generation metrics (faithfulness, relevance)
Diagnose common RAG failures: is it a retrieval problem or a generation problem?

Discussion Section (Tue Apr 1): Tools for RAG

Hands-on practice with the tools you'll use to build and evaluate RAG systems.

Week 10 Lab: RAG Exploration

Due: Sunday, April 5 by 11:59pm

Weight: Counts as part of the completion-based tasks category. Graded for completion.

This lab is intentionally open-ended. Use it to explore RAG in a direction that connects to your project idea. Build something small, see what breaks, and come away with a sense of what a RAG-based project would actually involve.

What to do:

Build a minimal RAG pipeline: chunk some documents, embed them, store in a vector DB, and retrieve against a few queries
Experiment with at least one design choice: chunk size, number of retrieved chunks, embedding model, or advanced technique (contextual retrieval, HyDE, hybrid search)
Document what you tried and what you noticed: when does retrieval work well? When does it fail?
Reflect on connections to your project: could RAG fit into what you're building? What would you need?

Deliverable: Push your notebook to GitHub (fully merged) and submit your repo link on Gradescope.

Resources for further learning

RAG foundations

RAG paper (Lewis et al., 2020) - Original retrieval-augmented generation
Dense Passage Retrieval (Karpukhin et al., 2020) - Semantic search foundations
Pinecone: What is RAG? - Accessible explainer

Advanced RAG

Anthropic: Contextual Retrieval
HyDE paper (Gao et al., 2022) - Hypothetical document embeddings
RAGAS - Automated RAG evaluation framework

Tools

ChromaDB - Simple, local vector database (recommended for getting started)
FAISS - Fast similarity search from Meta
Sentence Transformers - Open-source embedding models
LangChain RAG guide - End-to-end tutorials
LlamaIndex - RAG-focused framework

Lauren's CDS593 Materials