Name: [Solved] CS156 Homework #3-Classification
Brand: Assignment Chef
SKU: [Solved] CS156 Homework #3-Classification
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered unsinkable RMS Titanic sank after colliding with an iceberg. Unfortunately, there werent enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this assignment, we ask you to build a predictive model that answers the question: what sorts of people were more likely to survive? using passenger data (i.e., name, age, gender, socio-economic class, etc.).

Overview

The dataset called Data-Hw3.csv consists of 891 entries. This dataset needs to be split into two groups using 25% data for Test set

The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the ground truth) for each passenger. Your model will be based on features like passengers gender and class.

The test set should be used to see how well your model performs on unseen data. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

The dataset for this project contains information about the passengers in the Titanic and if they survived the historic accident. There are 8 column headers:

passenger ID An identifier for the passenger
name Name of the passenger
sex Male or Female
age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
pclass Ticket class. 1 = 1^st, 2 = 2^nd, 3 = 3^rd
survived 0 = no, 1 = yes

Variable Notes

pclass: A proxy for socio-economic status (SES)1st = Upper2nd = Middle3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, it in the form of xx.5

sibsp: The dataset defines family relations in this waySibling = brother, sister, stepbrother, stepsisterSpouse = husband, wife (mistresses and fiancs were ignored)

parch: The dataset defines family relations in this wayParent = mother, fatherChild = daughter, son, stepdaughter, stepsonSome children travelled only with a nanny, therefore parch=0 for them.

Part (A): Data Import, Data Pre-processing

Read the file Data-Hw3.csv
Replace Missing Data to make the data set complete
Divide the data set into Training set and Test set

For each model in Parts (B) through (H), use the following data set to test your model:

Data set to test your models:

[sex = male, age = 4, sibsp = 0, parch = 0, pclass = 3]
[sex = male, age = 4, sibsp = 4, parch = 0, pclass = 3]
[sex = male, age = 4, sibsp = 0, parch = 5, pclass = 3]
[sex = male, age = 4, sibsp = 0, parch = 0, pclass = 1]
[sex = male, age = 40, sibsp = 0, parch = 0, pclass = 3]
[sex = male, age = 40, sibsp = 4, parch = 0, pclass = 3]
[sex = male, age = 40, sibsp = 0, parch = 5, pclass = 3]
[sex = male, age = 40, sibsp = 0, parch = 0, pclass = 1]
[sex = female, age = 4, sibsp = 0, parch = 0, pclass = 3]
[sex = female, age = 4, sibsp = 4, parch = 0, pclass = 3]
[sex = female, age = 4, sibsp = 0, parch = 5, pclass = 3]
[sex = female, age = 4, sibsp = 0, parch = 0, pclass = 1]
[sex = female, age = 40, sibsp = 0, parch = 0, pclass = 3]
[sex = female, age = 40, sibsp = 4, parch = 0, pclass = 3]
[sex = female, age = 40, sibsp = 0, parch = 5, pclass = 3]
[sex = female, age = 40, sibsp = 0, parch = 0, pclass = 1]

Part (B): Use Logistic Regression to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (C): Use K Nearest Neighbor Classification with 7 neighbors to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (D): Use Support Vector Machine (SVM) Classification to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (E): Use Kernel Support Vector Machine (K-SVM) Classification to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (F): Use Nave Bayes Classification to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (G): Use Decision Tree Classification to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Part (H): Use Random Forest Classification with 10 Decision Trees to predict if a passenger in the Test set will survive the accident

Print the prediction and the corresponding ground truth in the Test set
Print the Confusion Matrix
Compute Accuracy
Print and Tabulate the result for the data set given above

Summarize your observations in terms of:

Tabulate the result of prediction from each of the models for the 16 dataset points

Logical Regression	K-Nearest Neighbors	Support Vector Machines	Kernel Support Vector Machine	Nave Bayes	Decision Tree	Random Forest
[0]	[1]	[0]	[1]	[0]	[1]	[1]
[0]	[0]	[0]	[1]	[0]	[0]	[0]
[0]	[1]	[0]	[1]	[0]	[1]	[1]
[1]	[1]	[0]	[1]	[1]	[1]	[0]
[0]	[0]	[0]	[0]	[0]	[0]	[0]
[0]	[0]	[0]	[0]	[0]	[0]	[0]
[0]	[0]	[0]	[0]	[0]	[0]	[0]
[0]	[0]	[0]	[0]	[0]	[0]	[0]
[1]	[1]	[1]	[1]	[1]	[1]	[1]
[1]	[0]	[1]	[1]	[0]	[1]	[0]
[1]	[1]	[1]	[1]	[1]	[1]	[1]
[1]	[1]	[1]	[1]	[1]	[1]	[1]
[1]	[0]	[1]	[0]	[1]	[0]	[0]
[0]	[0]	[1]	[0]	[0]	[0]	[0]
[1]	[0]	[1]	[0]	[1]	[0]	[0]
[1]	[1]	[1]	[0]	[1]	[1]	[1]
Accuracy Scores
0.7847533632286996	0.7802690582959642	0.7802690582959642	0.6591928251121076	0.7757847533632287	0.9237668161434978	0.9282511210762332

Which predictive models performed the best Top 3

In boldface above

What could possibly make the top 3 models outperform the rest?

Decision Trees and Random Forest are non-linear, whereas logistic regression works best with binary data.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CS156 Homework #3-Classification

Reviews

Related products

[Solved] CS156 Homework #6-Adversarial Search

[Solved] CS156 Homework #5-Informed Search A* Search

[Solved] CS156 Homework #2-Regression

[Solved] CS156 Homework #4-Uninformed Search