5/5 - (1 vote)

1 Naive Bayes

Name: [Solved] Homework #2 CS 260: Machine Learning Algorithms
Brand: Assignment Chef
SKU: [Solved] Homework #2 CS 260: Machine Learning Algorithms
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

The binary Naive Bayes classifier has interesting connections to the logistic regression classifier. You will show that, under certain assumptions, the Naive Bayes likelihood function is identical in form to the likelihood function for logistic regression. You will then derive the MLE parameter estimates under these assumptions.

Suppose X = {X₁,,X_D} is a continuous random vector in R^Drepresenting the features and Y is a binary random variable with values in {0,1} representing the class labels. Let the following assumptions hold:
- The label variable Y follows a Bernoulli distribution, with parameter = P(Y = 1).
- For each feature X_j, we have P(X_j|Y = y_k) follows a Gaussian distribution of the form N(_jk,_j).

Using the Naive Bayes assumption that states for all j⁰6= j, X_jand X_j0 are conditionally independent given Y , compute P(Y = 1|X) and show that it can be written in the following form:

Specifically, you need to find the explicit form of w₀and w in terms of , _jk, and _j, for j = 1,,D and k {0,1}.

Suppose a training set with N examples (x₁,y₁),(x₂,y₂), ,(x_N,y_N) is given, where x_i= (x_i₁, ,x_iD)^>is a D-dimensional feature vector, and y_i {0,1} is its corresponding label. Using the assumptions in 1.1 (not the result), provide the maximum likelihood estimation for the parameters of the Naive Bayes with Gaussian assumption. In other words, you need to provide the estimates for , _jk, and _j, for j = 1,,D and k {0,1}.

2 Logistic Regression

Consider a binary logistic regression model, where the training samples are linearly separable.

Given n training examples (x₁,y₁),(x₂,y₂),,(x_n,y_n) where y_i {0,1}, write down the negative log likelihood, L(w), in terms of the sigmoid function, x, and y.
Is this loss function convex? Provide your reasoning.
Show that the magnitude of the optimal w can go to infinity when the training samples are linearly separable.
A convenient way to prevent numerical instability issues is to add a penalty term to the likelihood function as follows:

, (1)

where and > 0. Compute the gradient respect to w_i, i.e. .

Show that the problem in Eq. (1) has a unique solution.

3 Decision Trees

Suppose you want to grow a decision tree to predict the accident rate based on the following accident data which provides the rate of accidents in 100 observations. Which predictor variable (weather or traffic) will you choose to split in the first step to maximize the information gain?

Weather	Traffic	Accident Rate	Number of observations
Sunny	Heavy	High	23
Sunny	Light	Low	5
Rainy	Heavy	High	50
Rainy	Light	Low	22

Suppose in another dataset, two students experiment with decision trees. The first student runs the decision tree learning algorithm on the raw data and obtains a tree T₁. The second student, normalizes the data by subtracting the mean and dividing by the variance of the features. Then, he runs the same decision tree algorithm with the same parameters and obtains a tree T₂. How algorithm. As discussed in ESL, Section 9.2.3, the most common splitting criteria are the Gini index and Cross-entropy. Both of these can be viewed as convex surrogates for the misclassification error. Prove that, for any discrete probability distribution p with K classes, the value of the Gini index is less than or equal to the corresponding value of the cross-entropy. This implies that the Gini index more closely approximates the misclassification error.

Definitions: For a K-valued discrete random variable with probability mass function p_i,i = 1,,K the Gini index is defined as: ) and the cross-entropy is defined as .

4 Comparing Classifiers in MATLAB/Octave

In this problem, you will work with the same dataset as in HW1 , and compare the performance of various classification algorithms. Starting with the one-hot-encoded version of the data that you generated in Question 4a in HW1, perform the following steps:

Fill in the function naive bayes in the naive m file. In particular, implement the Bernoulli Naive Bayes model from scratch (this will first require you to compute the MLE estimates). The inputs of this function are training data and new data (either validation or testing data). The function needs to output the accuracy on both training and new data (either validation or testing). Note that some feature values might exist in the validation/testing data that do not exist in the training data. In that case, please set the probability of that feature value to a small value, for example, 0.1. Note: You should NOT use any related Matlab toolbox functions, e.g., NaiveBayes.fit to implement Naive Bayes.
Compare the four algorithms (kNN, Naive Bayes, Decision Tree, and Logistic Regression) on the provided dataset. For each algorithm, report accuracies as detailed below, and describe the relative performance of these algorithms in a few sentences.

kNN: Report results from HW1.

Decision Tree: Train decision trees using the function ClassificationTree.fit or fitctree in Matlab. Report the training, validation and test accuracy for different split criterions (Gini index and cross-entropy using the SplitCriterion attribute) and different settings for the minimum size of leaf nodes to 1,2, ,10 (using the MinLeaf attribute). Thus, in total you will report the results for 2 10 = 20 different cases. When training decision trees, turn off pruning using the Prune attribute.

Naive Bayes: Report the training, validation and test accuracy.

Logistic Regression: Train multi-class logistic regression using the function mnrfit in Matlab. Report the training, validation and test accuracy.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] Homework #2 CS 260: Machine Learning Algorithms

1 Naive Bayes

2 Logistic Regression

3 Decision Trees

4 Comparing Classifiers in MATLAB/Octave

Reviews

Related products

[Solved] CS260: Machine Learning Problem 1. True or False

[Solved] CS260- Machine Learning Algorithms

[Solved] Homework #3 CS 260: Machine Learning Algorithms

[Solved] Homework #5 CS 260: Machine Learning Algorithms

[Solved] CS 260: Machine Learning-Homework 4 -Spring 2020

[Solved] CS260: Machine Learning Algorithms HomeWork 3 SOLVED