WEEK 8: The LLM Landscape and Fine-Tuning Strategies
Welcome back from spring break! This week we zoom out to survey the model landscape and then zoom back in to ask: once you have a model, how do you adapt it? Monday covers the ecosystem of available LLMs - how to read model cards, compare open vs. closed models, and pick the right tool for a task. Wednesday gets practical with fine-tuning: the adaptation spectrum from simple prompting all the way to full fine-tuning, with a focus on parameter-efficient methods like LoRA that make fine-tuning accessible.
This week's checklist
- Attend your oral exam time, if applicable
- Attend Lecture 11 (Mon, Mar 16): The LLM Landscape
- Attend discussion section (Tue, Mar 17): Model selection and fine-tuning strategy design
- Attend Lecture 12 (Wed, Mar 18): Fine-Tuning Strategies
- Submit Week 8 Lab (due Sun, Mar 22 by 11:59pm)
This week's learning objectives
After Lecture 11 (Mon Mar 16) students will be able to...
- Navigate the major model families: GPT series, Claude, Gemini, Llama, Mistral, Falcon
- Compare proprietary and open-source models: cost, capability, customization, privacy
- Read and interpret model cards: what information should a model provide, and what's missing?
- Make informed model selection decisions for specific use cases
- Explain the foundation model paradigm: pre-train once, adapt for many tasks
After Lecture 12 (Wed Mar 18) students will be able to...
- Navigate the adaptation spectrum: API calls, prompting, PEFT, full fine-tuning, pre-training from scratch
- Explain why full fine-tuning can be expensive and impractical at scale
- Describe LoRA (Low-Rank Adaptation): freeze the base model, train small adapter matrices
- Identify when to use LoRA vs. full fine-tuning vs. just prompting
- Explain catastrophic forgetting and how training choices can prevent it
- Recognize that fine-tuning can degrade safety training, and why that matters
Discussion Section (Tue Mar 17): Loading and Fine-Tuning Open Models
Hands-on practice with open-source model weights in Python.
Activities:
- Load a model from HuggingFace: Use
transformersto download and run a small open model (e.g., Llama 3.2 1B or Qwen2.5-1.5B). Run a few generations and inspect the output. - Fine-tune on a small dataset: Use the
transformersTrainer or a PEFT/LoRA setup to fine-tune on a toy dataset. Observe how loss changes and compare outputs before and after. - Discuss tradeoffs: What did you need to get this running? What would break at larger scale? What would you do differently for a real project?
Week 8 Lab: Exploring the LLM Landscape
Due: Sunday, March 22 by 11:59pm
What you'll do:
- Choose a task (e.g., summarization, code generation, question answering)
- Run the same prompts through 2-3 different models (mix open and proprietary if possible)
- Document the differences: quality, style, refusals, speed, cost per token
- Read and evaluate one model card critically: what's documented well? What's missing?
- Reflection: Based on your experiments, which model would you use for a real project, and why?
Deliverable: Push your notebook to GitHub (fully merged) and submit your repo link on Gradescope.
Note: Use the free tiers of APIs (OpenAI, Anthropic, together.ai, or HuggingFace Inference API) and small open models to keep costs low.
Resources for further learning
LLM landscape
- HuggingFace Open LLM Leaderboard - Browse current benchmarks
- Mistral AI - Leading open-weight model developer
- GPT-4 System Card
- LLaMA 4 Technical Report - Meta's latest
- Claude Model Card
Fine-tuning and PEFT
- LoRA paper (Hu et al., 2021) - Low-Rank Adaptation of Large Language Models
- HuggingFace PEFT library - LoRA and other PEFT methods
- HuggingFace fine-tuning tutorial
- QLoRA paper - Quantized LoRA for even more efficient fine-tuning
- together.ai - Cheap inference for open models
Staying current:
- Hugging Face Blog - New model releases, tutorials
- Artificial Analysis - Model comparisons, pricing, latency
- Chatbot Arena Leaderboard - Human preference rankings
- Open LLM Leaderboard
Frameworks:
- Stanford Foundation Models Report
- Stanford AI Index 2025 - Annual state of the field