Assessment Rubrics
CDS 593 - Spring 2026
This document contains the rubrics used to evaluate your work in this course: portfolio pieces, the final project (or paper alternative), and participation. Use these rubrics to understand expectations and guide your work.
Portfolio Piece Rubric
Total: 25 points
Each portfolio piece is assessed on five categories. The same rubric applies to both Portfolio Pieces 1 and 2.
| Category | Excellent (5) | Proficient (4) | Developing (3) | Beginning (1-2) |
|---|---|---|---|---|
| Conceptual Understanding | Explains why specific methods were chosen; connects to course material; reasoning is clear and accurate | Shows solid grasp of concepts; explanations are mostly accurate | Partial understanding; some misconceptions; explanations lack depth | Significant conceptual errors; misapplies methods |
| Technical Implementation | Code runs without errors; all components work correctly; handles edge cases | Code works for main use cases; minor bugs don't affect results | Code runs with some errors; missing components; bugs affect results | Code doesn't run or is severely incomplete |
| Code Quality & Documentation | Clear structure and naming; notebook tells a story (problem, approach, results, analysis); visualizations support the narrative | Readable code with good organization; good explanations of main steps | Hard to follow; sparse explanations; reader must infer what's happening | Disorganized; minimal or no explanation; no visualizations |
| Critical Analysis | Interprets results thoughtfully; discusses limitations and tradeoffs; compares approaches | Reasonable interpretation; mentions some limitations | Reports metrics without explaining what they mean | Shows outputs without analysis |
| Peer Reviews | Constructive feedback on 2 projects; identifies specific strengths and areas for improvement | Adequate feedback on 2 projects; notes what worked and what could improve | Vague or surface-level feedback; may only review 1 project | No peer reviews or unhelpful feedback |
What We're Looking For
Conceptual Understanding: We want to see that you understand the why, not just the what. Why did you choose this model? Why these hyperparameters? What are the tradeoffs?
Technical Implementation: Your code should run cleanly when we execute it. Test your notebook from top to bottom before submitting.
Code Quality & Documentation: Write code that your classmates could read and learn from. Your notebook should read like a report, not a code dump. Guide the reader through your thinking.
Critical Analysis: Don't just report numbers, interpret them. What do the results tell you? What are the limitations? What would you do differently?
Peer Reviews: Provide feedback that would actually help your classmates improve. Be specific about what worked and what could be better.
Final Project Rubric
Total: 50 points
The final project is assessed on eight categories. Scope & Ambition and Evaluation & Analysis are each worth 10 points (scored on a 1-10 scale) because they're where the most important learning happens. The remaining categories are worth 5 points each. Proposal and checkpoint deliverables are graded separately for completion and are not included in this rubric.
See the project guide for scope expectations by team size, project ideas, and tips.
Scope & Ambition (10 points)
This is where team-size expectations are reflected. A pair doing a solo-sized project, or a trio doing a pair-sized project, will lose points here.
| Score | Description |
|---|---|
| 9-10 | Tackles a genuinely challenging problem with clear motivation. Scope is appropriate for team size. Goes beyond a tutorial or obvious first approach. Solo projects show depth; team projects show depth and breadth. |
| 7-8 | Reasonable challenge with a clear problem statement. Some creativity or a solid execution of a non-trivial approach. Scope is mostly appropriate for team size. |
| 5-6 | Too simple, too ambitious, or scope doesn't match team size. Follows existing examples closely without adding much. |
| 1-4 | Inappropriate scope. Minimal originality. Could have been done in an afternoon, or was so ambitious that nothing works. |
Design Decisions (5 points)
| Score | Description |
|---|---|
| 5 | Explains why specific tools, models, and strategies were chosen. Considered alternatives and can articulate tradeoffs. Write-up shows clear reasoning, not just "I used X." |
| 4 | Explains most choices with reasonable justification. Some decisions are stated without alternatives considered. |
| 3 | Describes what was done but not why. Limited evidence of considering alternatives. |
| 1-2 | No justification for choices. Appears to have used defaults without thought. |
Technical Execution (5 points)
| Score | Description |
|---|---|
| 5 | Code runs reliably. Architecture is sensible and well-organized. Implementation demonstrates skill and care. |
| 4 | Solid implementation. Mostly works. Reasonable structure with minor issues. |
| 3 | Partial implementation. Significant bugs or architectural problems that affect results. |
| 1-2 | Doesn't work, or major components are missing. |
Use of Course Concepts (5 points)
| Score | Description |
|---|---|
| 5 | Deep application of multiple course concepts. Makes connections across topics (e.g., links attention mechanisms to retrieval strategy, or connects alignment concepts to evaluation choices). |
| 4 | Good application of relevant concepts. Demonstrates solid understanding. |
| 3 | Basic application. Some misunderstandings or limited depth. Uses course vocabulary without demonstrating understanding. |
| 1-2 | Minimal connection to course material. Fundamental conceptual errors. |
Evaluation & Analysis (10 points)
Double-weighted because this is where most projects fall short. "I built it and it works" is not enough.
| Score | Description |
|---|---|
| 9-10 | Rigorous evaluation with appropriate metrics and baselines. Includes error analysis: what kinds of inputs does it fail on, and why? Discusses limitations honestly. Results are reproducible. |
| 7-8 | Solid evaluation with reasonable metrics and at least one baseline comparison. Mentions limitations. Some error analysis. |
| 5-6 | Basic evaluation. Reports metrics but doesn't dig into what they mean. No baseline, or baseline is trivial. Limitations mentioned in passing. |
| 3-4 | Minimal evaluation. Shows outputs without measuring quality. No comparison. |
| 1-2 | No meaningful evaluation. |
Iteration & Reflection (5 points)
| Score | Description |
|---|---|
| 5 | Write-up tells the story of the process, not just the final product. Documents what was tried and abandoned, what didn't work and why, and what the team would do differently with more time. Shows genuine learning from failures. |
| 4 | Mentions some iteration. Discusses at least one thing that didn't work and how the approach changed. |
| 3 | Mostly describes the final system. Limited evidence of iteration or learning from mistakes. |
| 1-2 | No evidence of trying more than one approach. No reflection on process. |
Ethics & Limitations (5 points)
| Score | Description |
|---|---|
| 5 | Thoughtful consideration of who's affected, what could go wrong, and what the system doesn't capture. Addresses bias, safety, or fairness concerns specific to this project (not boilerplate). Considers deployment implications. |
| 4 | Discusses relevant ethical considerations with some specificity. Identifies real limitations. |
| 3 | Surface-level ethics discussion. Generic statements that could apply to any LLM project. |
| 1-2 | No meaningful engagement with ethics or limitations. |
Documentation & Presentation (5 points)
| Score | Description |
|---|---|
| 5 | Clear, well-organized write-up that tells a compelling story. Presentation is engaging and well-paced. Code is readable and documented. Someone could pick up your repo and understand what you did. |
| 4 | Good write-up and presentation. Organized and clear, with minor gaps. |
| 3 | Adequate but unclear in places. Reader has to work to follow the narrative. |
| 1-2 | Disorganized. Hard to follow. Code is a mess. |
Group Projects
For group projects, include a brief statement of who contributed what. Each team member will also complete a peer evaluation. Individual grades may be adjusted based on contribution.
Paper Presentation Rubric (Alternative to Final Project)
Total: 30 points
This option is for students who prefer a more theoretical approach. It requires critical analysis of a significant LLM paper plus a working demo. Contact the instructor by Week 4 to discuss paper selection and scheduling. Paper presentations can only be done solo (not in a group).
| Category | Excellent (5) | Proficient (4) | Developing (3) | Beginning (1-2) |
|---|---|---|---|---|
| Proposal & Preparation | Timely paper selection; clear proposal explaining approach and demo plan; well-prepared for scheduled slot | Good proposal and preparation with minor gaps | Late or incomplete proposal; some preparation issues | Missing or late proposal; significantly underprepared |
| Paper Understanding | Demonstrates deep understanding of the paper's contributions, methods, and context; can answer questions beyond what's in the paper | Solid understanding of main contributions and methods; minor gaps in technical details | Surface-level understanding; summarizes but doesn't fully grasp key ideas | Misunderstands core concepts; significant errors in explanation |
| Critical Analysis | Identifies strengths, limitations, and open questions; compares to related work; situates paper in broader context; offers original insights | Good discussion of strengths and limitations; some comparison to related work | Basic critique; mostly descriptive rather than analytical | No meaningful critique; just summarizes the paper |
| Implementation/Demo | Working demo that illustrates key concepts; helps audience understand the paper's contributions in practice | Functional demo that adds value to the presentation | Minimal demo; doesn't go much beyond showing existing outputs | No demo or demo doesn't work |
| Teaching & Accessibility | Makes complex material accessible; clear visualizations and explanations; audience leaves with solid understanding | Good explanations; most classmates can follow along | Some parts unclear or too technical for audience | Inaccessible to classmates; poor explanations |
| Presentation Delivery | Clear, well-organized, engaging; appropriate pacing; handles questions well | Good organization and clarity; answers most questions adequately | Somewhat disorganized or hard to follow; struggles with some questions | Confusing presentation; unable to answer basic questions |
Participation Rubric
Total: 10 points (5 points for each half of the semester, 10% of course grade)
Participation is assessed through self-reflection. At the midpoint and end of the semester, you'll submit a short reflection (1-2 paragraphs) making a case for your participation score, with specific examples of your contributions. The teaching team will review and confirm or adjust your self-assessment.
Ways to Participate
You don't need to engage in every category. Focus on 2-3 that fit your learning style:
- Lecture participation: Consistent attendance, asking and answering questions, engaging in group work
- Discussion section: Consistent attendance and active engagement
- Office hours: Coming to office hours to ask questions or discuss project work
- Piazza: Asking or answering questions that help the community learn
- Peer support: Helping classmates troubleshoot code or understand concepts outside of class, going an extra mile with peer feedback on portfolio pieces
Scoring Guidelines (per half-semester)
| Score | Description |
|---|---|
| 5 pts | Strong, consistent engagement in 2-3 categories. Your self-assessment provides specific examples that demonstrate meaningful contribution to your own learning and/or the class community. |
| 4 pts | Solid engagement in at least 1 category, or moderate engagement across several. Examples show genuine participation but may be less frequent or less impactful. |
| 3 pts | Some engagement but inconsistent. Attended class but rarely contributed beyond that. Limited examples to cite. |
| 1-2 pts | Minimal engagement. Sporadic attendance or participation. Few meaningful examples. |
Writing Your Self-Assessment
In your reflection, address:
- Which categories did you focus on? (You don't need to do all of them.)
- What specific examples demonstrate your engagement? (e.g., "I asked about X in lecture on [date]," "I helped [classmate] debug their portfolio piece," "I answered questions on Piazza about neural networks")
- What score do you believe you earned (out of 5) and why?
You can describe general patterns of engagement, but include at least 2-3 specific examples to support your case. The teaching team will confirm your assessment or follow up if we see it differently.
Example Self-Assessments
Example A (requesting 5/5):
I focused on lecture participation and peer support this half of the semester. I attended every lecture and regularly asked questions, and typically led small group work and discussions and presentations, such as when I presented for our group in lectures 4 and 6. I also helped several classmates outside of class: I spent about an hour helping Jordan debug a shape mismatch error in their CNN for Portfolio Piece 1, and I worked through the backpropagation math with Alex before the midterm. I'm requesting 5 points because I consistently engaged in two categories and contributed to both my and my classmates' learning.
Example B (requesting 4/5):
My main form of participation was attending office hours. I came to office hours three times to ask questions about my portfolio piece—once about feature engineering, once about hyperparameter tuning, and once to get feedback on my analysis before submitting. I attended most lectures, and even though I didn't ask many questions in class, I participated in groupwork and feel like I was fully engaged there. I'm requesting 4 points because I have been attentive to the course in multiple ways but have been actively engaged in just one category.