CS135 Project A: Classifying Sentiment
-
Overview
Project Timeline
Release on Thu 9/26
Form partners by Sun 10/06 Due on Thu 10/17
Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup
Team Formation
Encouraged to work in teams of 2, but can work individually Fill out ProjectA Team Formation Form by 10/6
If need help finding a teammate, post on Piazza
Work to Complete
One semi-open problem (Problem 1) and one completely open problem (Problem 2) Practice ML development cycle for both problems
Maintain leaderboards on Gradescope
-
What to Turn In
PDF Report
One report covering all problems, 4 – 6 pages Manually graded
Mark subproblems via Gradescope annotation tool
ZIP Files of Predictions
One ZIP file for Problem 1 and one for Problem 2
Each contains a single plain text file with float probabilities for test set predictions
Reflection Form
Each individual turns in a reflection form after completing the report
-
Starter Code and Code Restrictions
Starter Code Repo
https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA
Provides scripts to load data, but no other code
Code Usage
Can use any Python package
Understand and cite third-party code
-
Background
Dataset
From research work in KDD 2015 paper
Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com Training set of 2400 examples, test set of 600 examples in CSV format
Binary labels indicating sentiment
Performance Metric
Area under the ROC curve (AUROC)
-
Problem 1: Bag-of-Words Feature Representation
Background on Bag-of-Words
Represent documents as count vectors of a fixed vocabulary Many design decisions involved
Goals and Tasks
Develop BoW representation and binary classifier pipeline Experiment with preprocessing
Use LogisticRegression classifier
Use hyperparameter selection techniques with cross-validation
Report Sections
1A: Describe BoW design decisions 1B: Describe cross-validation design
1C: Describe hyperparameter selection for classifier 1D: Analyze predictions of best classifier
1E: Report test set performance on leaderboard
-
Problem 2: Open-ended challenge
Goals and Tasks
Use any feature representation, classifier, and hyperparameter selection procedure Try various methods to improve performance
Report Sections
2A: Describe feature representation
2B: Describe cross-validation or equivalent process 2C: Describe classifier and hyperparameter search 2D: Analyze errors of best classifier
2E: Report test set performance on leaderboard
-
Grading
Overall Grade Breakdown
87%: Report performance
10%: Leaderboard submissions 3%: Completion of reflection
Leaderboard Submissions
Score between 0.0 and 1.0 based on performance and comparison to top submissions
PDF Report
Points allocated across various parts of Problem 1 and Problem 2
Hyperparameter Selection Rubric
Figure and paragraph requirements for describing hyperparameter selection
Reviews
There are no reviews yet.