(1) [100 points] LFD Exercise 4.3.
(2) [100 points] LFD Exercise 4.6.
(3) [100 points] LFD Exercise 4.8.
(4) [100 points] LFD Exercise 4.11.
(5) [1000 points] An End-to-End Learning System with Regularization and Validation: Predicting 1s vs. Not 1s.
We revisit the MNIST Handwritten Digits Dataset we worked with in the last homework to solve the problem of
predicting whether a given image of a handwritten digit represents either the digit 1 or not the digit 1, i.e., if the n-th
example is labeled as being the digit 1, then yn = +1, and otherwise, yn = β1.
First, you must perform the following steps to prepare to solve the problem:
(1) [Combine Data] Combine the training and test sets (in ZipDigits.train and ZipDigits.test respectively) into a single dataset.
(2) [Compute Features] Use the algorithms you developed in the previous homework to compute two features for
each example in the dataset.
(3) [Normalize Features] For each data point, shift and then rescale the values of each feature across the entire
dataset so that for each feature, so that the values of every feature are in the range [β1, 1].
1
(4) [Create Input and Test Datasets] Select 300 data points from the dataset uniformly at random (and without
replacement) to for the dataset D. Use the remaining data points to form the test dataset Dtest. You must leave
aside Dtest and not touch it till the end of this exercise when we will compute the final output g of our learning
system. We will use Dtest to estimate Eout(g).
For convenience, we will treat this as a regression problem with real valued targets Β±1, until we output our final
hypothesis g for classifying handwritten images as either the digit 1 or not the digit 1, at which point we will use
sign(g(x)) to predict the class of a test data point x.
The standard polynomial feature transform generates features which are not βorthogonalβ, making the columns
in the data matrix dependent. This can be problematic for the one-step linear regression algorithm as it requires
computing the (pseudo-)inverse of a matrix. An βorthogonalβ polynomial transform is
(x1, x2) β (1, L1(x1), L1(x2), L2(x1), L1(x1)L1(x2), L2(x2), L3(x1), L2(x1)L1(x2), . . .),
where Lk(xi) is the k-th order polynomial transform applied to the i-th feature of the input data point x. See LFD
Problem 4.3 for a recursive expression that defines the Legendre polynomials which can be implemented as an efficient
iterative algorithm.
We will use the one-step (pseudo-inverse) algorithm for linear regression algorithm with weight decay regularization for learning. This corresponds to minimizing the augmented error Eaug(w) = Ein(w) + Ξ»wT w, where Ein(w)
is the sum of squared errors. The weights that minimize Eaug(w) are wlin(Ξ») = (Z
TZ + Ξ»I)
β1Z
T y, where Z is the
N Γ Λd matrix generated from the polynomial transform of the data points X in the data set D = (X, y) and wlin is the
Λd Γ 1 regularized weight vector.
Now, complete the following tasks to arrive at the final hypothesis g.
(Task 1) [100 points] 10-th order Polynomial Transform. Use the 10-th order Legendre polynomial feature transform to compute Z. Report the dimensions of Z.
(Task 2) [100 points] Overfitting. Plot the decision boundary of the output of the regularized linear regression algorithm without any regularization (Ξ» = 0). What do you observe, overfitting or underfitting?
(Task 3) [100 points] Regularization. Plot the decision boundary of the output of the regularized linear regression
algorithm with Ξ» = 3. Do you observe overfitting or underfitting?
(Task 4) [200 points] Cross Validation. Use leave-one-out cross validation to estimate ECV(Ξ») for Ξ» β
{0, 0.01, 0.1, 1, 5, 10, 25, 50, 75, 100}. Plot ECV versus Ξ» and Etest(wlin(Ξ»)) versus Ξ» on the same plot. Comment on the behavior of ECV and Etest versus Ξ». Here, ECV and Etest are the regression, sum of squared
errors.
(Task 5) [100 points] Pick Ξ». Use the cross validation errors from the previous step to pick the best value of Ξ», and
call it Ξ»
β
. Plot the decision boundary corresponding to the weights wlin(Ξ»β)
.
(Task 6) [100 points] Estimate Classification Error. Use wlin(Ξ»β)
for classification and estimate the classification
out-of-sample error Eout(wlin(Ξ»β)) for your final hypothesis g. Estimate Eout(g) to distinguishing between
digits that are 1s and not 1s (give the 99% error bar).
(Task 7) [100 points] Is ECV biased? Comment on whether ECV(Ξ»
β
) is an unbiased estimator of
Etest(wlin(Ξ»
β
))(treated as regression error). Why or why not?
(Task 8) [200 points] Data snooping. Etest(wlin(Ξ»
β
)) an unbiased estimator of Eout(wlin(Ξ»
β
)) (treat them as classification errors)? Why or why not? If not, what could we do differently to fix things, so that it is? Explain.
CS436/536:, Homework, Introduction, Learning, Machine, solved
[SOLVED] Cs436/536: introduction to machine learning homework 4
$25
File Name: Cs436_536__introduction_to_machine_learning_homework_4.zip
File Size: 508.68 KB
Reviews
There are no reviews yet.