, , ,

[SOLVED] Cs5644: assignment #2

$25

File Name: Cs5644:_assignment_#2.zip
File Size: 197.82 KB

5/5 - (1 vote)

(https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records). The dataset contains two files: one with a “.names” suffix and one with a “.data” suffix. The actual data is in the “.data” suffix and “.names” describes the metadata (i.e., describes what the different columns mean). Note that each row of the “.data” file contains one instance and includes both features and the class label (please take care to note the order). The machine learning  problem  here  is  to  take  the  votes  of  US congressmen/congresswomen as input and predict whether they are a Republican or a Democrat. In particular, our goal is to solve this problem using both decision trees and a naïve Bayes classifier. First, spend some time understanding the structure of the dataset, how the instances are organized, how the features/class are organized, and so on. You need to “massage” this data into the form that scikit-learn requires before you can apply either a decision tree or a naïve Bayes classifier. So spend some time understanding and planning how you will do this massaging. You can do this in Python or in Excel or any way you choose. Note that this step is a natural part of the machine learning and knowledge discovery process. Data is rarely given in the  form  that  machine  learning  can  be  directly  applied,  so  that considerable effort goes into cleaning, manipulating, and massaging it. Do not apply scikit-learn before ensuring that it is in the form required. Just like the  PlayTennis  dataset,  the  features  are  binary-valued  but note  that some features have missing values for some rows (instances). You need to decide how you will handle them.There are three possibilities here: i) discard instances that have missing feature values, ii) treat “missing” as if it is a value (and thus a binary feature becomes a ternary, or three-valued, feature), iii) impute missing values (i.e., for each feature, replace missing values with the most common value for that feature), so that they are no longer missing or unknown. If you read the “.notes” file, it explains why some values are missing and what they mean.    (continued)What to submit: Exactly one zipped file containing:  

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Cs5644: assignment #2
$25