WEEK 2: AI-Assisted Development & NLP Intro
This week we have just one lecture due to the snow day cancellation on Monday. We'll focus on how to effectively use AI tools for coding, then introduce the foundations of classical NLP.
This week's checklist (due Friday 1/30)
- (Note: Monday 1/26 class is cancelled due to weather)
- Attend Discussion Section (Tue, Jan 27): Getting started with Google Colab, GitHub Classroom, and using python for classical NLP
- Attend Lecture 2 (Wed, Jan 28): AI-assisted development + Classical NLP
- Complete Week 2 Reflection, pushed to GitHub
- Complete Lab 1, pushed to GitHub
This week's learning objectives
After Lecture 2 (Wed 1/28) students will be able to...
AI-Assisted Development:
- Identify appropriate AI coding tools for different development tasks (brainstorming, writing, debugging, understanding)
- Distinguish between AI coding interfaces (chat, edit mode, agentic) and when to use each
- Apply best practices for AI-assisted coding (verification, security awareness, understanding before shipping)
- Recognize common AI coding failures and when to be skeptical
Classical NLP:
- Explain the classical NLP pipeline: text to numbers to predictions
- Represent text documents using bag-of-words vectors
- Identify common preprocessing steps (lowercasing, stop words, stemming, etc.)
- Implement n-gram models for simple text generation
- Recognize the limitations of counting-based approaches (no context, no word meaning)
Discussion Section (Tue 1/27): Getting Started and Classic NLP
Note: This week's discussion happens before the lecture, so we'll use it as hands-on exploration rather than reinforcement.
Please bring your laptop to discussion! You will be coding during the class.
What you'll do:
- Learn about Google Colab and instructions for set-up
- Briefly review git and GitHub, troubleshoot any issues that came up with Lab 0
- Start building on a template repo using a bag-of-words and TF-IDF approach to solve a text classification problem.
Week 2 Reflection Prompts
Write 300-500 words reflecting on this week's content, or the area in general. Some prompts to consider:
- What has your experience been using AI tools for coding so far? What works well? What doesn't?
- After learning about bag-of-words and n-grams, what surprised you about these simple approaches? What can they do well?
- How do you think about the tradeoff between using AI tools to move fast vs. understanding what the code does?
- What questions do you have about AI-assisted development and classic NLP that we didn't cover?
Remember to write in your own voice, without AI assistance. These reflections are graded on completion only and help me understand what's working for you.
Lab 1: Text Processing Basics
Due: Friday, Jan 30 by 11:59pm
Suggested explorations
- Build upon the bag-of-words and TF-IDF work you began during discussion - what can you do to make the model better? Explore the impact on the size of the vocabulary, data cleaning decisions, the size of the training set, or the type of classifier model used.
- Experiment with n-gram text generation. Try 3-grams, 4-grams... Is there a relationship between input datasets size and the ideal n-gram length? Can you formulate a way to think about using a variable number of n-grams (sometimes 1-grams, sometimes 2-grams depending on the word or word pair? what kinds of pairs are important to preserve?)
- Find an interesting dataset to try these techniques on. Can you predict amazon product star ratings from the review text? Can you generate poetry with a certain structure or jokes with n-grams and a little cleverness?
Resources for further learning
On AI coding tools
- A Practical Guide to AI-Assisted Coding Tools - a great guide to choosing the right AI and IDE for the job
- Making Tea While AI Codes - practical workflows and guardrails for AI-assisted development
- AI-Assisted Software Development: A Comprehensive Guide - how to actually use AI for coding, with prompt examples for different stages of a project
- GitHub Copilot Best Practices
- OpenAI - Best Practices for Prompt Engineering
Videos
- DataMListic - TF-IDF explained - A great explanation of BoW and TF-IDF with a computed example
Tutorials
- Understanding Bag of Words and TF-IDF: A Beginner-Friendly Guide
- N-gram Language Models - Chapter from Speech and Language Processing (more of a deep dive)