, , , , ,

[SOLVED] Cs535 programming assignment 4: naïve bayes

$25

File Name: Cs535_programming_assignment_4__na__ve_bayes.zip
File Size: 414.48 KB

5/5 - (1 vote)

The purpose of this assignment is to get you familiar with sentiment classification. By the
end of this assignment, you will have your very own “Sentiment Analyzer”. You are given a
Large Movie Review Dataset that contains a separate labeled train and test set. Your task is
to train a Naïve Bayes classifier on the train set and report accuracy on the test set.Dataset:
The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The
overall distribution of labels is balanced (25k pos and 25k neg). There are two top-level
directories [train/, test/] corresponding to the training and test sets. Each contains [pos/,
neg/] directories for the reviews with binary labels positive and negative. Within these
directories, reviews are stored in text files named following the convention
[[id]_[rating].txt] where [id] is a unique id and [rating] is the star rating for that review on a
1-10 scale.For example, the file [test/pos/200_8.txt] is the text for a positive-labeled test
set example with unique id 200 and star rating 8/10 from IMDb. The dataset can be
downloaded from here.Preprocessing:
In the preprocessing step you’re required to remove the stop words and punctuation marks
and other unwanted characters from the reviews and convert them to lower case. You may
find the string and regex module useful for this purpose. A stop word list is provided with
the dataset.Part 1:
Implement Naïve Bayes for sentiment analysis from scratch keeping in view all the
discussions from the class lectures. Feel free to read Chapter 4 (Section 4.1, 4.2, 4.3) of the
Speech and Language Processing book to get an in-depth insight into the Naïve Bayes
classifier. Use Bag-of-words representation and apply Laplace (Add-1) smoothing as
discussed in the class lectures. Specifically, you will need to implement the following
algorithm:
Report accuracy and confusion matrix. The expected accuracy on the test set is around
80%.Part 2:
Use scikit-learn’s CountVectorizer to transform your train and test set to bag-of-words
representation and Naïve Bayes implementation to train and test the Naïve Bayes on the
provided dataset. Use scikit-learn’s accuracy_score function to calculate the accuracy and
confusion_matrix function to calculate the confusion matrix on the test set.

Shopping Cart

No products in the cart.

No products in the cart.

[SOLVED] Cs535 programming assignment 4: naïve bayes[SOLVED] Cs535 programming assignment 4: naïve bayes
$25