Question-1
Use https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz MNIST
dataset for this question and select two digits – 0 and 1. Label them as -1 and
1. In this exercise you will be implementing AdaBoost.M1. Perform following
tasks.
• Divide the train set into train and val set. Keep 1000 samples from each
class for val. Note val should be used to evaluate the performance of the
classifier. Must not be used in obtaining PCA matrix.
• Apply PCA and reduce the dimension to p = 5. You can use the train set
of the two classes to obtain PCA matrix. For the remaining parts, use the
reduced dimension dataset.
• Now learn a decision tree using the train set. You need to grow a decision stump. For each dimension, find the unique values and sort them
in ascending order. The splits to be evaluated will be midpoint of two
consecutive unique values. Find the best split by minimizing weighted
1
miss-classification error. Denote this as h1(x). Note as we are dealing
with real numbers, each value may be unique. So just sorting them and
taking midpoint of consecutive values may also result in similar tree. [2]
• Compute α1 and update weights.
• Now build another tree h2(x) using the train set but with updated weights.
Compute α2 and update weights. Similarly grow 300 such stumps.
• After every iteration find the accuracy on val set and report. You should
show a plot of accuracy on val set vs. number of trees. Use the tree that
gives highest accuracy and evaluate that tree on test set. Report test
accuracy. [2]
Q2. Consider the above as a regression problem. Apply gradient boosting
using absolute loss and report the MSE between predicted and actual values of
test set.
• Divide the train set into train and val set. Keep 1000 samples from each
class for val. Note val should be used to evaluate the performance of the
classifier. Must not be used in obtaining PCA matrix.
• Apply PCA and reduce the dimension to p = 5. You can use the train set
of the two classes to obtain PCA matrix. For the remaining parts, use the
reduced dimension dataset.
• Now learn a decision tree using the train set. You need to grow a decision stump. For each dimension, find the unique values and sort them
in ascending order. The splits to be evaluated will be midpoint of two
consecutive unique values. Find the best split by minimizing SSR. Denote
this as h1(x). [1]
• Compute residue using y − .01h1(x).
• Now build another tree h2(x) using the train set but with updated labels.
Note, now you have to update labels based on the way we update labels
for absolute loss. That is the labels will be obtained as negative gradients.
Compute residue using y − .01h1(x) − .01h2(x). [1]
• Similarly grow 300 such stumps. Note, the labels are updated every iteration based on negative gradients.
• After every iteration find the MSE on val set and report. You should
show a plot of MSE on val set vs. number of trees. Use the tree that gives
lowest MSE and evaluate that tree on test set. Report test MSE. [1]
Assignment, CSE342, solved
[SOLVED] Cse342 sml assignment 4
$25
File Name: Cse342_sml_assignment_4.zip
File Size: 216.66 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.