Theory and Applications of Large Language Models

DS 593 - Spring 2026

Instructor: Prof. Lauren Wheelock

Email: laurenbw@bu.edu

Class Meetings: Monday/Wednesday 12:20-1:35pm

Office Hours: Every weekday with a member of the teaching team

  • Prof. Wheelock: Mon 11-12 in the CDS building, room 1506
  • Bhoomika: Wed 11-12 and Fri 10-11 location TBD
  • Naky: Tue 1-2 and Thu 4-5 location TBD

Course Description

Large language models are reshaping software development, data science, and AI research. In this course, you'll learn how and why LLMs work, then master the skills to adapt and deploy them in real applications. You'll build transformers from scratch to understand the architecture deeply, then move to production techniques: fine-tuning models for specific tasks, building RAG-powered chatbots, and developing AI agents. After this course, you'll have a portfolio of work and the confidence to discuss these techniques in your future work and research.

We'll start with classical NLP and work up through modern transformer architectures, giving you both theoretical understanding and hands-on implementation experience. Throughout, we emphasize responsible AI: understanding bias, safety considerations, and the real-world implications of deployment decisions.

Recommended Co-requisite: Introduction to Machine Learning/AI (DS340 or equivalent)

Learning Objectives

By the end of this course, you will be able to:

  • Build a transformer from scratch and explain how attention mechanisms work
  • Implement a production RAG system with vector databases and semantic search
  • Fine-tune open-source LLMs for specific applications using LoRA and other PEFT techniques
  • Design and red-team prompt engineering strategies, including defenses against injection attacks
  • Critically evaluate LLMs for bias, safety risks, and alignment with human values
  • Maintain a professional technical portfolio demonstrating your work with modern AI tools

What to Expect in This Course

Weekly rhythm:

  • Monday/Wednesday: New concepts through lecture and discussion. Expect icebreakers, group activities, and minimal laptop use. We'll close laptops to focus on ideas, opening them only for specific hands-on activities.
  • Tuesday: Discussion section (optional but highly recommended) for hands-on practice with that week's techniques, troubleshooting, and getting started on labs
  • Friday evenings: Weekly reflection and lab notebook due (pushed to your GitHub repo)
  • Throughout the week: Work on your GitHub portfolio, explore resources, engage on Piazza. Office hours are available every weekday with a member of the teaching team!

Weekly deliverables: Each week you'll complete:

  • A personal reflection (300-500 words) on what you're learning
  • A lab notebook documenting your experiments and implementation work
  • See the detailed weekly guides on our website for specific prompts, resources, and learning objectives for each week

Twice per semester: You'll take your exploratory weekly labs and polish them into portfolio pieces - cohesive, well-documented projects ready for peer review and professional portfolios.

Two midterms, no final: In-class exams (Week 6 and Week 12) test your conceptual understanding on paper. You will have the option to redo one topic on exam section orally to demonstrate post-exam learning.

One final project: The capstone of the course where you apply everything you've learned to build something substantial, whether that's training a model from scratch, building a production RAG system, creating an AI agent, or diving deep into research. You'll work through ideation, proposal, development, and presentation stages, with checkpoints to keep you on track. This becomes a portfolio piece you can show future employers or use as a foundation for further research.

AI Use Policy

For coding: There are no restrictions on AI use to assist in your coding. Correspondingly, I have high expectations for the quality of the final products you will be able to produce in course projects, especially the final project. Using AI-powered coding tools will be especially helpful if you are building a project that uses non-LLM software components, such as building a web interface or app.

For reflections: I do ask that you write your weekly reflections without AI, in your own voice. These assignments are not graded for content, and I will use them to aid in my own teaching and reflection on the course, and to understand what material is most valuable to you. These are about your experiences and opinions. I don't care about grammar, and they can be stream-of-consciousness if need be.

For exams: There will be no technology or cheat-sheet use on exams so that I can evaluate your understanding of the theory we cover.

Course Tools

  • GitHub: For your labs, reflections, portfolio pieces, and final project
  • [Piazza](https://piazza.com/class/mkegpx14bz48t] for questions, discussions, and announcements
  • Gradescope for exam and portfolio piece grading
  • Course website:

Course Structure

See the website's Course Schedule for detailed day-by-day topics and due dates.

Part I: Foundations (Weeks 1-3)

Where are we going? How are we going to work together?

  • Welcome, GitHub and collaboration setup
  • Introduction to NLP and the current LLM landscape

How did we process language before transformers?

  • AI-assisted development tools and best practices
  • Classical NLP: bag-of-words, TF-IDF, naive Bayes, tokenization deep dive

How do neural networks learn from text?

  • Deep learning fundamentals: backpropagation, gradient descent
  • Word embeddings: Word2vec, GloVe, distributional hypothesis
  • Sequence-to-sequence models and the bottleneck problem

Part II: Transformer Architecture (Weeks 4-6)

What makes transformers so powerful?

  • Attention mechanisms: Query-Key-Value framework, scaled dot-product attention
  • Self-attention and multi-head attention
  • Transformer architecture: encoder-decoder blocks, residual connections, layer normalization

How do we actually build and use transformers?

  • Implementing transformers from scratch
  • Transformer variants: BERT, GPT, T5
  • Using pre-trained models with HuggingFace, visualizing attention with BertViz
  • Philosophy of AI: consciousness, understanding, Chinese Room, Turing test

Portfolio Piece 1 due (Week 5)

First Midterm (Week 6)

Part III: LLMs at Scale (Weeks 6-8)

How do you train a model that costs millions of dollars?

  • Pre-training LLMs: data sources, cleaning pipelines, scaling laws (Kaplan vs Chinchilla)
  • Training at scale: distributed training, compute costs, environmental impact
  • Post-training and RLHF: instruction tuning, reward modeling, reinforcement learning
  • Constitutional AI: principles-based alignment vs human preferences

How do we evaluate and compare LLMs?

  • Evaluation frameworks: benchmarks (MMLU, HellaSwag, TruthfulQA), Goodhart's Law
  • The LLM landscape: GPT, Claude, LLaMA, foundation models, open vs. closed
  • Fine-tuning strategies and PEFT: when to fine-tune, LoRA, catastrophic forgetting, safety considerations

Part IV: Applications (Weeks 8-11)

How do we make LLMs do what we want, and what can go wrong?

  • Prompt engineering: core principles, few-shot learning, chain-of-thought reasoning
  • Prompt injection and jailbreaking: attack surface, direct/indirect injection, defense strategies
  • Safety, alignment, and red-teaming: whose values?, real-world harms, alignment tax

How can LLMs access and use external knowledge?

  • Retrieval-augmented generation (RAG): vector databases, semantic search, retrieval augmentation
  • Hallucination mitigation and advanced RAG architectures
  • Evaluating RAG system performance

How can LLMs act autonomously in the world?

  • AI agents: tool use, reasoning loops, multi-agent systems
  • Memory systems and long-term context
  • Real-world agent applications and limitations

Portfolio Piece 2 due (Week 10)

Final Project Proposal due (Week 11)

Part V: Deployment and Capstone (Weeks 12-14)

How do we responsibly deploy what we've built?

  • Deployment considerations: production systems, API design, monitoring
  • Safety in production: content filtering, rate limiting, abuse prevention
  • Regulatory landscape and ethical considerations

Second Midterm (Week 12)

What's emerging in the field right now?

  • Guest lecture or discussion of current developments
  • Final project development and peer consultation

What can you build with everything you've learned?

  • Final project presentations and demonstrations

Final Project due (May 1)

Assessment Structure

ComponentWeight
Demonstrating Learning Process30%
Weekly Reflections + Labs10%
Participation10%
Portfolio Pieces10%
Demonstrating Mastery40%
Midterm 120%
Midterm 220%
Final Project30%
Total100%

Weekly Reflections + Lab Notebooks (10%)

There are no traditional homework assignments. Each week you will keep a GitHub repo for this course that includes:

  • Reflections: Weekly reflections (300-500 words each) documenting your learning, questions, and connections to other topics
  • Lab Notebooks: Well-documented Jupyter notebooks showing your thought process and experiments with the course material (20-50 lines of working code plus comments)

Timing: Complete each week's reflection and lab notebook by Friday evening. Submit by pushing to your GitHub repo.

Freedom to explore: I will give suggested questions and resources for exploration but you are free to take these assignments in another direction and follow your interests as long as your work is related to the topics covered. For example, if you are particularly interested in the philosophical or linguistic aspects of language models, you could make that a theme throughout all your reflections and data work.

Grading: These assignments are graded for completion only (credit/no credit), not for content. The teaching team will read your work and leave constructive feedback. If there is a particular type of feedback you are interested in for your own growth, let us know!

Participation (10%)

I don't expect everyone to engage in the same way, so focusing on 2-3 of these will merit full participation credit:

  • Participation in lecture: Consistent attendance, asking and answering questions, participating in groupwork
  • Participation in discussion: Consistent attendance and engagement
  • Office hours: Coming to office hours to ask questions and discuss project work
  • Piazza engagement: Asking or answering questions that help the community learn
  • Peer support: Helping classmates troubleshoot code or understand concepts

Participation Self-Assessment: At the middle and the end of the semester, you will submit a short reflection (1-2 paragraphs) making a case for your participation grade based on a rubric I will provide, giving specific examples of your contributions. The teaching team will review and confirm or adjust your self-assessment.

Portfolio Pieces (10% total)

Two portfolio pieces where you build upon your past labs to create a polished project. Portfolio pieces must be completed individually. You will share your work with peers for feedback, and providing thoughtful peer reviews is part of your grade.

Each portfolio piece will be graded on a detailed rubric (100 points total) covering: conceptual understanding, technical implementation, code quality, documentation & communication, critical analysis, originality & depth, and peer reviews. See the detailed rubric document for point breakdown and grading criteria at each level.

Exams (40% total)

These exams are designed to check your mastery of theoretical material, while project work demonstrates your mastery of applications.

  • First Midterm (20% - Week 6, Feb 23):
  • Second Midterm (20% - Week 12, Apr 15):

No reference materials will be allowed on the exams.

Both exams will occur in-class on the dates shown (75 minutes). You can mark these dates in your calendar now, since they are firm. If you have existing accommodations that impact exams, please let me know as soon as possible, but by two weeks before the exam at the very latest.

Exam Structure: Exams are organized into standards, each covering a specific topic area. This structure helps you identify which concepts you've mastered and which need more work, and enables you to select one topic for re-examination (see oral exam policy below).

Final Project (30%)

Final projects can be completed individually or in groups of 2-3 people. Group projects should be more ambitious in scope, with clear division of labor documented.

Project options might include:

  • Train a small language model from scratch and explore what's possible without relying on pre-trained models
  • Build a RAG-based chatbot with prompt engineering that minimizes hallucinations for a particular application
  • Fine-tune an open-source LLM for a particular application and demonstrate improved performance
  • Build an LLM agent for a specific task with LangChain or MCP and a web interface
  • Deep dive into a recent LLM research paper with implementation and novel analysis

Project checkpoints:

  • Week 8 (Mar 20): Project ideation checkpoint - submit 2-3 project ideas, form teams (if applicable)
  • Week 11 (Apr 10): Project proposal - one-page proposal including problem statement, proposed approach, evaluation plan, timeline, and (if group) division of labor. Dataset acquired and preliminary exploration complete.
  • Week 14 (Apr 27-29): Final presentations in class
  • May 1: Final project write-ups due

Grading: Projects will be assessed on a rubric (50 points total) covering: scope & ambition, design decisions, technical execution, use of course concepts, evaluation & analysis, iteration & reflection, ethics & limitations, and documentation & presentation. See the detailed rubric document for point breakdown and grading criteria, and the project guide for scope expectations and tips.

For group projects, individual grades may differ based on contribution (assessed through peer evaluations).

Paper presentation alternative: A paper presentation is available as an alternative for students who are more theoretically inclined. This involves critical analysis of a significant LLM paper (not just summary) plus at least one of: implementation/demo, novel visualizations/teaching materials, or synthesis with additional sources. This is expected to take equivalent effort as a final project. If you are intertested in this option, please reach out to me by Week 4 so we can select an appropriate paper and make time in class for your presentation, which may be better fit for an earlier point in the term than the final week.

Additional Course Policies

Extensions and Late Work

Weekly reflections and labs: These will receive 100% credit if they meet the length criteria, are on topic, and are submitted by the deadline, with up to 90% credit one day late, 80% credit two days late, and no credit after more than two days. Since these tasks are lightweight I do not expect to offer extensions except in extreme circumstances.

Portfolio pieces: Same late policy as weekly work (100% on time, 90% one day late, 80% two days late, 0% after). Since these assignments are posted for peer review, turning them in late impedes the ability for your peers to provide feedback, so I will rarely offer extensions.

Final projects: Projects submitted by the last day of class (May 1) will receive up to 100% credit. Since this course does not have a final exam, I must issue final grades within 48 hours of the end of the term, so projects more than 48 hours late will result in a (temporary) Incomplete, and will receive up to 70% credit once the project is submitted. I highly encourage you to submit projects by the deadline, even if you feel they could be improved, with your reflections on what you would have done with more time or how you could have planned differently.

Exams: There will be no make-up exams without prior arrangement or documented emergency if within 24 hours of the exam time.

Calculating and communicating grades

I will be tracking your course grades in a spreadsheet and will automate email updates so you can see your gradebook status approximately every 2 weeks. If you receive these emails and believe there is a factual error on your grade sheet (for example, you see a late penalty on a lab you believe you completed on time) please reply to the email and I will look into it.

Exams, portfolio pieces, and the final project will be graded on rubric forms on Gradescope and your score will automatically be sent to you through that tool. You will also see these scores reflected in the gradebook emails that follow.

"Curving" exams and course grades

I reserve the right to add a fixed number of "free" points to linearly curve exam scores - this will never result in a lower grade for anyone. It is my intention to design exams so this policy should not be needed.

I will use the standard map from numeric grades to letter grades (>=93 is A, >=90 is A-, etc) to produce final grades for the class. This final distribution will not be curved or capped.

Regrade requests

You have the right to request a re-grade of any rubric-based assignment or exam. Regrade requests must be submitted using the Gradescope interface, not by email, and must be submitted within one week of grading. If you request a re-grade for a portion of an assignment, then we may review the entire assignment, not just the part in question. This may potentially result in a lower grade.

Oral exam re-test

You may elect after either the first or second exam to re-examine one topic during a personalized oral exam. (Exams will be clearly broken up into equally-weighted topics.) The oral exam may consist of questions about your original answers, related questions that did not appear on the exam, or discussion of code or other work that relates to the topic. You must request a re-test within one week of exam grades being posted, and the re-tests will be scheduled roughly a week later. More information on this option will be provided during the semester.

Corrections

There are no exam corrections or assignment corrections in this course. With the exception of the oral exam option, assignment and exam grades are final.

Classroom Presence and Engagement

This course emphasizes active learning through discussions, activities, and collaborative work. When you're here, you're here. This means:

  • Laptops and devices should be closed unless we're actively using them for course activities
  • I may occasionally cold-call on students (gently!) to foster discussion
  • If you're too busy to engage fully in class activities, it's better to skip that session and catch up later

I understand that life happens and sometimes you need to miss class. That's okay! But when you do attend, I ask that you be mentally present and ready to participate.

Absences

This course follows BU's policy on religious observance. Otherwise, it is generally expected that students attend lectures and discussion sections. There is no need to email me in advance for missing a class due to illness or other conflict (unless there is an exam or presentation). If you miss a lecture, please review the lecture notes and confer with other students in the class. Lectures will not be recorded.

If you expect to miss more than two lectures in a row, please let me know as soon as possible so we can make a plan and I can help give you any support you need.

In the unlikely event that I cannot teach in person on a particular day, I will send a Piazza announcement with further instructions.

Collaboration

You are encouraged to discuss concepts and approaches with classmates, but all written work and code must be your own (unless it's a group project). For portfolio pieces, you may discuss general strategies but not share code or specific solutions. Cite any external resources you use, including AI dev tools.

Academic Integrity

This course follows all BU policies regarding academic honesty. Plagiarism or cheating of any kind will result in a failing grade for the assignment and possible referral to the university.

Accommodations

If you need accommodations, please let me know as soon as possible. You have the right to have your needs met, and the sooner you let me know, the sooner I can make arrangements to support you. Students with documented disabilities should contact the Office for Disability Services (ODS) at access@bu.edu or (617) 353-3658. Scheduling of alternative exam times and environments due to accommodations are handled by ODS directly.

Wellness

Your wellbeing matters. If you are struggling with course material, personal issues, or anything else, please reach out. I'm happy to work with you on extensions, alternative arrangements, or just to listen.