Assessment Rubrics

CDS 593 - Spring 2026

This document contains the rubrics used to evaluate your work in this course: portfolio pieces, the final project (or paper alternative), and participation. Use these rubrics to understand expectations and guide your work.

Portfolio Piece Rubric

Total: 25 points

Each portfolio piece is assessed on five categories. The same rubric applies to both Portfolio Pieces 1 and 2.

Category	Excellent (5)	Proficient (4)	Developing (3)	Beginning (1-2)
Conceptual Understanding	Explains why specific methods were chosen; connects to course material; reasoning is clear and accurate	Shows solid grasp of concepts; explanations are mostly accurate	Partial understanding; some misconceptions; explanations lack depth	Significant conceptual errors; misapplies methods
Technical Implementation	Code runs without errors; all components work correctly; handles edge cases	Code works for main use cases; minor bugs don't affect results	Code runs with some errors; missing components; bugs affect results	Code doesn't run or is severely incomplete
Code Quality & Documentation	Clear structure and naming; notebook tells a story (problem, approach, results, analysis); visualizations support the narrative	Readable code with good organization; good explanations of main steps	Hard to follow; sparse explanations; reader must infer what's happening	Disorganized; minimal or no explanation; no visualizations
Critical Analysis	Interprets results thoughtfully; discusses limitations and tradeoffs; compares approaches	Reasonable interpretation; mentions some limitations	Reports metrics without explaining what they mean	Shows outputs without analysis
Peer Reviews	Constructive feedback on 2 projects; identifies specific strengths and areas for improvement	Adequate feedback on 2 projects; notes what worked and what could improve	Vague or surface-level feedback; may only review 1 project	No peer reviews or unhelpful feedback

What We're Looking For

Conceptual Understanding: We want to see that you understand the why, not just the what. Why did you choose this model? Why these hyperparameters? What are the tradeoffs?

Technical Implementation: Your code should run cleanly when we execute it. Test your notebook from top to bottom before submitting.

Code Quality & Documentation: Write code that your classmates could read and learn from. Your notebook should read like a report, not a code dump. Guide the reader through your thinking.

Critical Analysis: Don't just report numbers, interpret them. What do the results tell you? What are the limitations? What would you do differently?

Peer Reviews: Provide feedback that would actually help your classmates improve. Be specific about what worked and what could be better.

Final Project Rubric

Total: 50 points

The final project is assessed on eight categories. Scope & Ambition and Evaluation & Analysis are each worth 10 points (scored on a 1-10 scale) because they're where the most important learning happens. The remaining categories are worth 5 points each. Proposal and checkpoint deliverables are graded separately for completion and are not included in this rubric.

See the project guide for scope expectations by team size, project ideas, and tips.

Scope & Ambition (10 points)

This is where team-size expectations are reflected. A pair doing a solo-sized project, or a trio doing a pair-sized project, will lose points here.

Score	Description
9-10	Tackles a genuinely challenging problem with clear motivation. Scope is appropriate for team size. Goes beyond a tutorial or obvious first approach. Solo projects show depth; team projects show depth and breadth.
7-8	Reasonable challenge with a clear problem statement. Some creativity or a solid execution of a non-trivial approach. Scope is mostly appropriate for team size.
5-6	Too simple, too ambitious, or scope doesn't match team size. Follows existing examples closely without adding much.
1-4	Inappropriate scope. Minimal originality. Could have been done in an afternoon, or was so ambitious that nothing works.

Design Decisions (5 points)

Score	Description
5	Explains why specific tools, models, and strategies were chosen. Considered alternatives and can articulate tradeoffs. Write-up shows clear reasoning, not just "I used X."
4	Explains most choices with reasonable justification. Some decisions are stated without alternatives considered.
3	Describes what was done but not why. Limited evidence of considering alternatives.
1-2	No justification for choices. Appears to have used defaults without thought.

Technical Execution (5 points)

Score	Description
5	Code runs reliably. Architecture is sensible and well-organized. Implementation demonstrates skill and care.
4	Solid implementation. Mostly works. Reasonable structure with minor issues.
3	Partial implementation. Significant bugs or architectural problems that affect results.
1-2	Doesn't work, or major components are missing.

Use of Course Concepts (5 points)

Score	Description
5	Deep application of multiple course concepts. Makes connections across topics (e.g., links attention mechanisms to retrieval strategy, or connects alignment concepts to evaluation choices).
4	Good application of relevant concepts. Demonstrates solid understanding.
3	Basic application. Some misunderstandings or limited depth. Uses course vocabulary without demonstrating understanding.
1-2	Minimal connection to course material. Fundamental conceptual errors.

Evaluation & Analysis (10 points)

Double-weighted because this is where most projects fall short. "I built it and it works" is not enough.

Score	Description
9-10	Rigorous evaluation with appropriate metrics and baselines. Includes error analysis: what kinds of inputs does it fail on, and why? Discusses limitations honestly. Results are reproducible.
7-8	Solid evaluation with reasonable metrics and at least one baseline comparison. Mentions limitations. Some error analysis.
5-6	Basic evaluation. Reports metrics but doesn't dig into what they mean. No baseline, or baseline is trivial. Limitations mentioned in passing.
3-4	Minimal evaluation. Shows outputs without measuring quality. No comparison.
1-2	No meaningful evaluation.

Iteration & Reflection (5 points)

Score	Description
5	Write-up tells the story of the process, not just the final product. Documents what was tried and abandoned, what didn't work and why, and what the team would do differently with more time. Shows genuine learning from failures.
4	Mentions some iteration. Discusses at least one thing that didn't work and how the approach changed.
3	Mostly describes the final system. Limited evidence of iteration or learning from mistakes.
1-2	No evidence of trying more than one approach. No reflection on process.

Ethics & Limitations (5 points)

Score	Description
5	Thoughtful consideration of who's affected, what could go wrong, and what the system doesn't capture. Addresses bias, safety, or fairness concerns specific to this project (not boilerplate). Considers deployment implications.
4	Discusses relevant ethical considerations with some specificity. Identifies real limitations.
3	Surface-level ethics discussion. Generic statements that could apply to any LLM project.
1-2	No meaningful engagement with ethics or limitations.

Documentation & Presentation (5 points)

Score	Description
5	Clear, well-organized write-up that tells a compelling story. Presentation is engaging and well-paced. Code is readable and documented. Someone could pick up your repo and understand what you did.
4	Good write-up and presentation. Organized and clear, with minor gaps.
3	Adequate but unclear in places. Reader has to work to follow the narrative.
1-2	Disorganized. Hard to follow. Code is a mess.

Group Projects

For group projects, include a brief statement of who contributed what. Each team member will also complete a peer evaluation. Individual grades may be adjusted based on contribution.

Paper Presentation Rubric (Alternative to Final Project)

Total: 30 points

This option is for students who prefer a more theoretical approach. It requires critical analysis of a significant LLM paper plus a working demo. Contact the instructor by Week 4 to discuss paper selection and scheduling. Paper presentations can only be done solo (not in a group).

Category	Excellent (5)	Proficient (4)	Developing (3)	Beginning (1-2)
Proposal & Preparation	Timely paper selection; clear proposal explaining approach and demo plan; well-prepared for scheduled slot	Good proposal and preparation with minor gaps	Late or incomplete proposal; some preparation issues	Missing or late proposal; significantly underprepared
Paper Understanding	Demonstrates deep understanding of the paper's contributions, methods, and context; can answer questions beyond what's in the paper	Solid understanding of main contributions and methods; minor gaps in technical details	Surface-level understanding; summarizes but doesn't fully grasp key ideas	Misunderstands core concepts; significant errors in explanation
Critical Analysis	Identifies strengths, limitations, and open questions; compares to related work; situates paper in broader context; offers original insights	Good discussion of strengths and limitations; some comparison to related work	Basic critique; mostly descriptive rather than analytical	No meaningful critique; just summarizes the paper
Implementation/Demo	Working demo that illustrates key concepts; helps audience understand the paper's contributions in practice	Functional demo that adds value to the presentation	Minimal demo; doesn't go much beyond showing existing outputs	No demo or demo doesn't work
Teaching & Accessibility	Makes complex material accessible; clear visualizations and explanations; audience leaves with solid understanding	Good explanations; most classmates can follow along	Some parts unclear or too technical for audience	Inaccessible to classmates; poor explanations
Presentation Delivery	Clear, well-organized, engaging; appropriate pacing; handles questions well	Good organization and clarity; answers most questions adequately	Somewhat disorganized or hard to follow; struggles with some questions	Confusing presentation; unable to answer basic questions

Participation Rubric

Total: 10 points (5 points for each half of the semester, 10% of course grade)

Participation is assessed through self-reflection. At the midpoint and end of the semester, you'll submit a short reflection (1-2 paragraphs) making a case for your participation score, with specific examples of your contributions. The teaching team will review and confirm or adjust your self-assessment.

Ways to Participate

You don't need to engage in every category. Focus on 2-3 that fit your learning style:

Lecture participation: Consistent attendance, asking and answering questions, engaging in group work
Discussion section: Consistent attendance and active engagement
Office hours: Coming to office hours to ask questions or discuss project work
Piazza: Asking or answering questions that help the community learn
Peer support: Helping classmates troubleshoot code or understand concepts outside of class, going an extra mile with peer feedback on portfolio pieces

Scoring Guidelines (per half-semester)

Score	Description
5 pts	Strong, consistent engagement in 2-3 categories. Your self-assessment provides specific examples that demonstrate meaningful contribution to your own learning and/or the class community.
4 pts	Solid engagement in at least 1 category, or moderate engagement across several. Examples show genuine participation but may be less frequent or less impactful.
3 pts	Some engagement but inconsistent. Attended class but rarely contributed beyond that. Limited examples to cite.
1-2 pts	Minimal engagement. Sporadic attendance or participation. Few meaningful examples.

Writing Your Self-Assessment

In your reflection, address:

Which categories did you focus on? (You don't need to do all of them.)
What specific examples demonstrate your engagement? (e.g., "I asked about X in lecture on [date]," "I helped [classmate] debug their portfolio piece," "I answered questions on Piazza about neural networks")
What score do you believe you earned (out of 5) and why?

You can describe general patterns of engagement, but include at least 2-3 specific examples to support your case. The teaching team will confirm your assessment or follow up if we see it differently.

Example Self-Assessments

Example A (requesting 5/5):

I focused on lecture participation and peer support this half of the semester. I attended every lecture and regularly asked questions, and typically led small group work and discussions and presentations, such as when I presented for our group in lectures 4 and 6. I also helped several classmates outside of class: I spent about an hour helping Jordan debug a shape mismatch error in their CNN for Portfolio Piece 1, and I worked through the backpropagation math with Alex before the midterm. I'm requesting 5 points because I consistently engaged in two categories and contributed to both my and my classmates' learning.

Example B (requesting 4/5):

My main form of participation was attending office hours. I came to office hours three times to ask questions about my portfolio piece—once about feature engineering, once about hyperparameter tuning, and once to get feedback on my analysis before submitting. I attended most lectures, and even though I didn't ask many questions in class, I participated in groupwork and feel like I was fully engaged there. I'm requesting 4 points because I have been attentive to the course in multiple ways but have been actively engaged in just one category.

Lauren's CDS593 Materials