WEEK 8: The LLM Landscape and Fine-Tuning Strategies

Welcome back from spring break! This week we zoom out to survey the model landscape and then zoom back in to ask: once you have a model, how do you adapt it? Monday covers the ecosystem of available LLMs - how to read model cards, compare open vs. closed models, and pick the right tool for a task. Wednesday gets practical with fine-tuning: the adaptation spectrum from simple prompting all the way to full fine-tuning, with a focus on parameter-efficient methods like LoRA that make fine-tuning accessible.

This week's checklist

Attend your oral exam time, if applicable
Attend Lecture 11 (Mon, Mar 16): The LLM Landscape
Attend discussion section (Tue, Mar 17): Model selection and fine-tuning strategy design
Attend Lecture 12 (Wed, Mar 18): Fine-Tuning Strategies
Submit Week 8 Lab (due Sun, Mar 22 by 11:59pm)

This week's learning objectives

After Lecture 11 (Mon Mar 16) students will be able to...

Navigate the major model families: GPT series, Claude, Gemini, Llama, Mistral, Falcon
Compare proprietary and open-source models: cost, capability, customization, privacy
Read and interpret model cards: what information should a model provide, and what's missing?
Make informed model selection decisions for specific use cases
Explain the foundation model paradigm: pre-train once, adapt for many tasks

After Lecture 12 (Wed Mar 18) students will be able to...

Navigate the adaptation spectrum: API calls, prompting, PEFT, full fine-tuning, pre-training from scratch
Explain why full fine-tuning can be expensive and impractical at scale
Describe LoRA (Low-Rank Adaptation): freeze the base model, train small adapter matrices
Identify when to use LoRA vs. full fine-tuning vs. just prompting
Explain catastrophic forgetting and how training choices can prevent it
Recognize that fine-tuning can degrade safety training, and why that matters

Discussion Section (Tue Mar 17): Loading and Fine-Tuning Open Models

Hands-on practice with open-source model weights in Python.

Activities:

Load a model from HuggingFace: Use transformers to download and run a small open model (e.g., Llama 3.2 1B or Qwen2.5-1.5B). Run a few generations and inspect the output.
Fine-tune on a small dataset: Use the transformers Trainer or a PEFT/LoRA setup to fine-tune on a toy dataset. Observe how loss changes and compare outputs before and after.
Discuss tradeoffs: What did you need to get this running? What would break at larger scale? What would you do differently for a real project?

Week 8 Lab: Exploring the LLM Landscape

Due: Sunday, March 22 by 11:59pm

What you'll do:

Choose a task (e.g., summarization, code generation, question answering)
Run the same prompts through 2-3 different models (mix open and proprietary if possible)
Document the differences: quality, style, refusals, speed, cost per token
Read and evaluate one model card critically: what's documented well? What's missing?
Reflection: Based on your experiments, which model would you use for a real project, and why?

Deliverable: Push your notebook to GitHub (fully merged) and submit your repo link on Gradescope.

Note: Use the free tiers of APIs (OpenAI, Anthropic, together.ai, or HuggingFace Inference API) and small open models to keep costs low.

Resources for further learning

LLM landscape

HuggingFace Open LLM Leaderboard - Browse current benchmarks
Mistral AI - Leading open-weight model developer
GPT-4 System Card
LLaMA 4 Technical Report - Meta's latest
Claude Model Card

Fine-tuning and PEFT

LoRA paper (Hu et al., 2021) - Low-Rank Adaptation of Large Language Models
HuggingFace PEFT library - LoRA and other PEFT methods
HuggingFace fine-tuning tutorial
QLoRA paper - Quantized LoRA for even more efficient fine-tuning
together.ai - Cheap inference for open models

Staying current:

Hugging Face Blog - New model releases, tutorials
Artificial Analysis - Model comparisons, pricing, latency
Chatbot Arena Leaderboard - Human preference rankings
Open LLM Leaderboard

Frameworks:

Stanford Foundation Models Report
Stanford AI Index 2025 - Annual state of the field

Lauren's CDS593 Materials