Data Description

The data can be downloaded from here .

This dataset was created from 64 emails collected from the DBWorld mailing list. Please note, the actual emails are not given to you, and the emails have already been processed using NLP.

There are two datasets, dbworld_bodies_stemmed and dbworld_subjects_stemmed corresponding to the email body and email subject respectively

The data is currently represented as a binary stemmed bag-of-words and requires no additional NLP.

Each dataset is in a table form with 64 rows and n
The 1^st column is id and has values from 1 to 64, corresponding to each of the 64 emails (this column can be removed).
The 2 to n-1 columns are unique words found in all the emails, they have binary values i.e. 0 means that the word did not appear in the email and 1 means that the word appeared.
The n^th column is CLASS, 0 means discard email and 1 means keep email.

Nave Bayes Classifier

You should implement from scratch a Nave Bayes classifier (using the spam filter example discussed in class).

Also implement Laplacian smoothing to handle words not in the dictionary. (40 points)

Using the implemented algorithm, train and test the model for each dataset.

Use 80% of each class data to train your classifier and the remaining 20% to test it. Which dataset provides better classification i.e. email body or email subject? (20 points)

f -measure= 2PreRec Pre+ Rec

TP TP

where Pre= ; Rec= ; TP+ FP TP + FN

and TP is the number of true positives (class 1 members predicted as class 1), TN is the number of true negatives (class 2 members predicted as class 2), FP is the number of false positives (class 2 members predicted as class 1), and FN is the number of false negatives (class 1 members predicted as class 2).

Compare your classifier with the scikit-learn implementation

(sklearn.naive_bayes.MultinomialNB ).

Repeat the analysis from (b). (20 points)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CS 534-Artificial Intelligence Assignment 5

Data Description

Nave Bayes Classifier

Reviews

Whatsapp Us

[Solved] CS 534-Artificial Intelligence Assignment 5

Data Description

Nave Bayes Classifier

Reviews

Related products

[Solved] CS534 Homework 3

[Solved] CS534 Implementation #2-Perceptron algorithm for Optical Character Recognition

[Solved] CS534 Assignment1-Linear regression with L2 regularization

[Solved] CS534 Assignment 2-Decision Tree Ensemble for Optical Character Recognition

[Solved] CS534 Implementation #3-Decision Tree Ensemble for Optical Character Recognition

[Solved] CS534 Homework 2