5/5 - (1 vote)

CS135 Project A: Classifying Sentiment

Overview

Project Timeline

Release on Thu 9/26

Form partners by Sun 10/06 Due on Thu 10/17

Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup

Team Formation

Encouraged to work in teams of 2, but can work individually Fill out ProjectA Team Formation Form by 10/6

If need help finding a teammate, post on Piazza

Work to Complete

One semi-open problem (Problem 1) and one completely open problem (Problem 2) Practice ML development cycle for both problems

Maintain leaderboards on Gradescope
What to Turn In

PDF Report

One report covering all problems, 4 – 6 pages Manually graded

Mark subproblems via Gradescope annotation tool

ZIP Files of Predictions

One ZIP file for Problem 1 and one for Problem 2

Each contains a single plain text file with float probabilities for test set predictions

Reflection Form

Each individual turns in a reflection form after completing the report
Starter Code and Code Restrictions

Starter Code Repo

https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA

Provides scripts to load data, but no other code

Code Usage

Can use any Python package

Understand and cite third-party code
Background

Dataset

From research work in KDD 2015 paper

Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com Training set of 2400 examples, test set of 600 examples in CSV format

Binary labels indicating sentiment

Performance Metric

Area under the ROC curve (AUROC)
Problem 1: Bag-of-Words Feature Representation

Background on Bag-of-Words

Represent documents as count vectors of a fixed vocabulary Many design decisions involved

Goals and Tasks

Develop BoW representation and binary classifier pipeline Experiment with preprocessing

Use LogisticRegression classifier

Use hyperparameter selection techniques with cross-validation

Report Sections

1A: Describe BoW design decisions 1B: Describe cross-validation design

1C: Describe hyperparameter selection for classifier 1D: Analyze predictions of best classifier

1E: Report test set performance on leaderboard
Problem 2: Open-ended challenge

Goals and Tasks

Use any feature representation, classifier, and hyperparameter selection procedure Try various methods to improve performance

Report Sections

2A: Describe feature representation

2B: Describe cross-validation or equivalent process 2C: Describe classifier and hyperparameter search 2D: Analyze errors of best classifier

2E: Report test set performance on leaderboard
Grading