Assessment Rubrics

CDS 593 - Spring 2026

This document contains the rubrics used to evaluate your work in this course: portfolio pieces, the final project (or paper alternative), and participation. Use these rubrics to understand expectations and guide your work.


Portfolio Piece Rubric

Total: 25 points

Each portfolio piece is assessed on five categories. The same rubric applies to both Portfolio Pieces 1 and 2.

CategoryExcellent (5)Proficient (4)Developing (3)Beginning (1-2)
Conceptual UnderstandingExplains why specific methods were chosen; connects to course material; reasoning is clear and accurateShows solid grasp of concepts; explanations are mostly accuratePartial understanding; some misconceptions; explanations lack depthSignificant conceptual errors; misapplies methods
Technical ImplementationCode runs without errors; all components work correctly; handles edge casesCode works for main use cases; minor bugs don't affect resultsCode runs with some errors; missing components; bugs affect resultsCode doesn't run or is severely incomplete
Code Quality & DocumentationClear structure and naming; notebook tells a story (problem, approach, results, analysis); visualizations support the narrativeReadable code with good organization; good explanations of main stepsHard to follow; sparse explanations; reader must infer what's happeningDisorganized; minimal or no explanation; no visualizations
Critical AnalysisInterprets results thoughtfully; discusses limitations and tradeoffs; compares approachesReasonable interpretation; mentions some limitationsReports metrics without explaining what they meanShows outputs without analysis
Peer ReviewsConstructive feedback on 2 projects; identifies specific strengths and areas for improvementAdequate feedback on 2 projects; notes what worked and what could improveVague or surface-level feedback; may only review 1 projectNo peer reviews or unhelpful feedback

What We're Looking For

Conceptual Understanding: We want to see that you understand the why, not just the what. Why did you choose this model? Why these hyperparameters? What are the tradeoffs?

Technical Implementation: Your code should run cleanly when we execute it. Test your notebook from top to bottom before submitting.

Code Quality & Documentation: Write code that your classmates could read and learn from. Your notebook should read like a report, not a code dump. Guide the reader through your thinking.

Critical Analysis: Don't just report numbers, interpret them. What do the results tell you? What are the limitations? What would you do differently?

Peer Reviews: Provide feedback that would actually help your classmates improve. Be specific about what worked and what could be better.


Final Project Rubric

Total: 50 points

The final project is assessed on eight categories. Scope & Ambition and Evaluation & Analysis are each worth 10 points (scored on a 1-10 scale) because they're where the most important learning happens. The remaining categories are worth 5 points each. Proposal and checkpoint deliverables are graded separately for completion and are not included in this rubric.

See the project guide for scope expectations by team size, project ideas, and tips.

Scope & Ambition (10 points)

This is where team-size expectations are reflected. A pair doing a solo-sized project, or a trio doing a pair-sized project, will lose points here.

ScoreDescription
9-10Tackles a genuinely challenging problem with clear motivation. Scope is appropriate for team size. Goes beyond a tutorial or obvious first approach. Solo projects show depth; team projects show depth and breadth.
7-8Reasonable challenge with a clear problem statement. Some creativity or a solid execution of a non-trivial approach. Scope is mostly appropriate for team size.
5-6Too simple, too ambitious, or scope doesn't match team size. Follows existing examples closely without adding much.
1-4Inappropriate scope. Minimal originality. Could have been done in an afternoon, or was so ambitious that nothing works.

Design Decisions (5 points)

ScoreDescription
5Explains why specific tools, models, and strategies were chosen. Considered alternatives and can articulate tradeoffs. Write-up shows clear reasoning, not just "I used X."
4Explains most choices with reasonable justification. Some decisions are stated without alternatives considered.
3Describes what was done but not why. Limited evidence of considering alternatives.
1-2No justification for choices. Appears to have used defaults without thought.

Technical Execution (5 points)

ScoreDescription
5Code runs reliably. Architecture is sensible and well-organized. Implementation demonstrates skill and care.
4Solid implementation. Mostly works. Reasonable structure with minor issues.
3Partial implementation. Significant bugs or architectural problems that affect results.
1-2Doesn't work, or major components are missing.

Use of Course Concepts (5 points)

ScoreDescription
5Deep application of multiple course concepts. Makes connections across topics (e.g., links attention mechanisms to retrieval strategy, or connects alignment concepts to evaluation choices).
4Good application of relevant concepts. Demonstrates solid understanding.
3Basic application. Some misunderstandings or limited depth. Uses course vocabulary without demonstrating understanding.
1-2Minimal connection to course material. Fundamental conceptual errors.

Evaluation & Analysis (10 points)

Double-weighted because this is where most projects fall short. "I built it and it works" is not enough.

ScoreDescription
9-10Rigorous evaluation with appropriate metrics and baselines. Includes error analysis: what kinds of inputs does it fail on, and why? Discusses limitations honestly. Results are reproducible.
7-8Solid evaluation with reasonable metrics and at least one baseline comparison. Mentions limitations. Some error analysis.
5-6Basic evaluation. Reports metrics but doesn't dig into what they mean. No baseline, or baseline is trivial. Limitations mentioned in passing.
3-4Minimal evaluation. Shows outputs without measuring quality. No comparison.
1-2No meaningful evaluation.

Iteration & Reflection (5 points)

ScoreDescription
5Write-up tells the story of the process, not just the final product. Documents what was tried and abandoned, what didn't work and why, and what the team would do differently with more time. Shows genuine learning from failures.
4Mentions some iteration. Discusses at least one thing that didn't work and how the approach changed.
3Mostly describes the final system. Limited evidence of iteration or learning from mistakes.
1-2No evidence of trying more than one approach. No reflection on process.

Ethics & Limitations (5 points)

ScoreDescription
5Thoughtful consideration of who's affected, what could go wrong, and what the system doesn't capture. Addresses bias, safety, or fairness concerns specific to this project (not boilerplate). Considers deployment implications.
4Discusses relevant ethical considerations with some specificity. Identifies real limitations.
3Surface-level ethics discussion. Generic statements that could apply to any LLM project.
1-2No meaningful engagement with ethics or limitations.

Documentation & Presentation (5 points)

ScoreDescription
5Clear, well-organized write-up that tells a compelling story. Presentation is engaging and well-paced. Code is readable and documented. Someone could pick up your repo and understand what you did.
4Good write-up and presentation. Organized and clear, with minor gaps.
3Adequate but unclear in places. Reader has to work to follow the narrative.
1-2Disorganized. Hard to follow. Code is a mess.

Group Projects

For group projects, include a brief statement of who contributed what. Each team member will also complete a peer evaluation. Individual grades may be adjusted based on contribution.


Paper Presentation Rubric (Alternative to Final Project)

Total: 30 points

This option is for students who prefer a more theoretical approach. It requires critical analysis of a significant LLM paper plus a working demo. Contact the instructor by Week 4 to discuss paper selection and scheduling. Paper presentations can only be done solo (not in a group).

CategoryExcellent (5)Proficient (4)Developing (3)Beginning (1-2)
Proposal & PreparationTimely paper selection; clear proposal explaining approach and demo plan; well-prepared for scheduled slotGood proposal and preparation with minor gapsLate or incomplete proposal; some preparation issuesMissing or late proposal; significantly underprepared
Paper UnderstandingDemonstrates deep understanding of the paper's contributions, methods, and context; can answer questions beyond what's in the paperSolid understanding of main contributions and methods; minor gaps in technical detailsSurface-level understanding; summarizes but doesn't fully grasp key ideasMisunderstands core concepts; significant errors in explanation
Critical AnalysisIdentifies strengths, limitations, and open questions; compares to related work; situates paper in broader context; offers original insightsGood discussion of strengths and limitations; some comparison to related workBasic critique; mostly descriptive rather than analyticalNo meaningful critique; just summarizes the paper
Implementation/DemoWorking demo that illustrates key concepts; helps audience understand the paper's contributions in practiceFunctional demo that adds value to the presentationMinimal demo; doesn't go much beyond showing existing outputsNo demo or demo doesn't work
Teaching & AccessibilityMakes complex material accessible; clear visualizations and explanations; audience leaves with solid understandingGood explanations; most classmates can follow alongSome parts unclear or too technical for audienceInaccessible to classmates; poor explanations
Presentation DeliveryClear, well-organized, engaging; appropriate pacing; handles questions wellGood organization and clarity; answers most questions adequatelySomewhat disorganized or hard to follow; struggles with some questionsConfusing presentation; unable to answer basic questions

Participation Rubric

Total: 10 points (5 points for each half of the semester, 10% of course grade)

Participation is assessed through self-reflection. At the midpoint and end of the semester, you'll submit a short reflection (1-2 paragraphs) making a case for your participation score, with specific examples of your contributions. The teaching team will review and confirm or adjust your self-assessment.

Ways to Participate

You don't need to engage in every category. Focus on 2-3 that fit your learning style:

  • Lecture participation: Consistent attendance, asking and answering questions, engaging in group work
  • Discussion section: Consistent attendance and active engagement
  • Office hours: Coming to office hours to ask questions or discuss project work
  • Piazza: Asking or answering questions that help the community learn
  • Peer support: Helping classmates troubleshoot code or understand concepts outside of class, going an extra mile with peer feedback on portfolio pieces

Scoring Guidelines (per half-semester)

ScoreDescription
5 ptsStrong, consistent engagement in 2-3 categories. Your self-assessment provides specific examples that demonstrate meaningful contribution to your own learning and/or the class community.
4 ptsSolid engagement in at least 1 category, or moderate engagement across several. Examples show genuine participation but may be less frequent or less impactful.
3 ptsSome engagement but inconsistent. Attended class but rarely contributed beyond that. Limited examples to cite.
1-2 ptsMinimal engagement. Sporadic attendance or participation. Few meaningful examples.

Writing Your Self-Assessment

In your reflection, address:

  1. Which categories did you focus on? (You don't need to do all of them.)
  2. What specific examples demonstrate your engagement? (e.g., "I asked about X in lecture on [date]," "I helped [classmate] debug their portfolio piece," "I answered questions on Piazza about neural networks")
  3. What score do you believe you earned (out of 5) and why?

You can describe general patterns of engagement, but include at least 2-3 specific examples to support your case. The teaching team will confirm your assessment or follow up if we see it differently.

Example Self-Assessments

Example A (requesting 5/5):

I focused on lecture participation and peer support this half of the semester. I attended every lecture and regularly asked questions, and typically led small group work and discussions and presentations, such as when I presented for our group in lectures 4 and 6. I also helped several classmates outside of class: I spent about an hour helping Jordan debug a shape mismatch error in their CNN for Portfolio Piece 1, and I worked through the backpropagation math with Alex before the midterm. I'm requesting 5 points because I consistently engaged in two categories and contributed to both my and my classmates' learning.

Example B (requesting 4/5):

My main form of participation was attending office hours. I came to office hours three times to ask questions about my portfolio piece—once about feature engineering, once about hyperparameter tuning, and once to get feedback on my analysis before submitting. I attended most lectures, and even though I didn't ask many questions in class, I participated in groupwork and feel like I was fully engaged there. I'm requesting 4 points because I have been attentive to the course in multiple ways but have been actively engaged in just one category.