CS135 Project A: Classifying Sentiment
Project Timeline
Release on Thu 9/26
Form partners by Sun 10/06
Due on Thu 10/17
Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup
Team Formation
Encouraged to work in teams of 2, but can work individually
Fill out ProjectA Team Formation Form by 10/6
If need help finding a teammate, post on Piazza
Work to Complete
One semi-open problem (Problem 1) and one completely open problem (Problem 2)
Practice ML development cycle for both problems
Maintain leaderboards on Gradescope
What to Turn In
PDF Report
One report covering all problems, 4 – 6 pages
Manually graded
Mark subproblems via Gradescope annotation tool
ZIP Files of Predictions
One ZIP file for Problem 1 and one for Problem 2
Each contains a single plain text file with float probabilities for test set predictions
Reflection Form
Each individual turns in a reflection form after completing the report
Starter Code and Code Restrictions
Starter Code Repo
Provides scripts to load data, but no other code
Code Usage
Can use any Python package
Understand and cite third-party code
From research work in KDD 2015 paper
Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com
Training set of 2400 examples, test set of 600 examples in CSV format
Binary labels indicating sentiment
Performance Metric
Area under the ROC curve (AUROC)
Problem 1: Bag-of-Words Feature Representation
Background on Bag-of-Words
Represent documents as count vectors of a fixed vocabulary
Many design decisions involved
Goals and Tasks
Develop BoW representation and binary classifier pipeline
Experiment with preprocessing
Use LogisticRegression classifier
Use hyperparameter selection techniques with cross-validation
Report Sections
1A: Describe BoW design decisions
1B: Describe cross-validation design
1C: Describe hyperparameter selection for classifier
1D: Analyze predictions of best classifier
1E: Report test set performance on leaderboard
Problem 2: Open-ended challenge
Goals and Tasks
Use any feature representation, classifier, and hyperparameter selection procedure
Try various methods to improve performance
Report Sections
2A: Describe feature representation
2B: Describe cross-validation or equivalent process
2C: Describe classifier and hyperparameter search
2D: Analyze errors of best classifier
2E: Report test set performance on leaderboard
Overall Grade Breakdown
87%: Report performance
10%: Leaderboard submissions
3%: Completion of reflection
Leaderboard Submissions
Score between 0.0 and 1.0 based on performance and comparison to top submissions
PDF Report
Points allocated across various parts of Problem 1 and Problem 2
Hyperparameter Selection Rubric
Figure and paragraph requirements for describing hyperparameter selection
There are no reviews yet.