- Project Timeline
- Release on Thu 9/26
- Form partners by Sun 10/06
- Due on Thu 10/17
- Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup
- Team Formation
- Encouraged to work in teams of 2, but can work individually
- Fill out ProjectA Team Formation Form by 10/6
- If need help finding a teammate, post on Piazza
- Work to Complete
- One semi-open problem (Problem 1) and one completely open problem (Problem 2)
- Practice ML development cycle for both problems
- Maintain leaderboards on Gradescope
- PDF Report
- One report covering all problems, 4 – 6 pages
- Manually graded
- Mark subproblems via Gradescope annotation tool
- ZIP Files of Predictions
- One ZIP file for Problem 1 and one for Problem 2
- Each contains a single plain text file with float probabilities for test set predictions
- Reflection Form
- Each individual turns in a reflection form after completing the report
- Starter Code Repo
- https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA
- Provides scripts to load data, but no other code
- Code Usage
- Can use any Python package
- Understand and cite third-party code
- Dataset
- From research work in KDD 2015 paper
- Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com
- Training set of 2400 examples, test set of 600 examples in CSV format
- Binary labels indicating sentiment
- Performance Metric
- Area under the ROC curve (AUROC)
- Background on Bag-of-Words
- Represent documents as count vectors of a fixed vocabulary
- Many design decisions involved
- Goals and Tasks
- Develop BoW representation and binary classifier pipeline
- Experiment with preprocessing
- Use LogisticRegression classifier
- Use hyperparameter selection techniques with cross-validation
- Report Sections
- 1A: Describe BoW design decisions
- 1B: Describe cross-validation design
- 1C: Describe hyperparameter selection for classifier
- 1D: Analyze predictions of best classifier
- 1E: Report test set performance on leaderboard
- Goals and Tasks
- Use any feature representation, classifier, and hyperparameter selection procedure
- Try various methods to improve performance
- Report Sections
- 2A: Describe feature representation
- 2B: Describe cross-validation or equivalent process
- 2C: Describe classifier and hyperparameter search
- 2D: Analyze errors of best classifier
- 2E: Report test set performance on leaderboard
- Overall Grade Breakdown
- 87%: Report performance
- 10%: Leaderboard submissions
- 3%: Completion of reflection
- Leaderboard Submissions
- Score between 0.0 and 1.0 based on performance and comparison to top submissions
- PDF Report
- Points allocated across various parts of Problem 1 and Problem 2
- Hyperparameter Selection Rubric
- Figure and paragraph requirements for describing hyperparameter selection# CS135 F24 Project A
Reviews
There are no reviews yet.