5/5 - (1 vote)

1 Bias-Variance Tradeo

Name: [Solved] Homework #5 CS 260: Machine Learning Algorithms
Brand: Assignment Chef
SKU: [Solved] Homework #5 CS 260: Machine Learning Algorithms
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

Consider a dataset with n data points (x_i,y_i), x_i2R^p¹, drawn from the following linear model:

y = x^{> ?}+ ,

where is a Gaussian noise and the star sign is used to dierentiate the true parameter from the estimators that will be introduced later. Consider the L₂regularized linear regression as follows:

b = argmin,

whereeach row. Properties of an0 is the regularization parameter. Leta ne transformation Xof a gaussian random variable will be useful throughout2Rⁿ^pdenote the matrix obtained by stackingin this problem.

Find the closed form solution for and its distribution.
Calculate the bias term E[x^>b] bxb^{> ?}as a function ofb and some fixed test point x.

Calculate the variance term E x^>E[x^>] as a function of and some fixed test point x.
Use the results from parts (b) and (c) and the biasvariance theorem to analyze the impact of in the squared error. Specifically, which term dominates when is small or large?

2 Kernelized Perceptron

Given a set of training samples (learns a weight vector w by iterating through all training samples. For eachx1,y1),(x2,y2), ,(xN,yN) where y 2 { x1_i,, if the prediction is incorrect,1}, the Perceptron algorithm

we update w by w w + yixi. Now we would like to, and we want to learn a new weight vectorkernelize the Perceptron algorithm. Assume we mapw that makes x to (x) through a nonlinear feature mapping prediction by y = sign(w^>(x)). Further assume that we initial the algorithm with w = 0.

Show that w is always a linear combination of feature vectors, i.e. w
Show that while the update rule for w for a kernelized Perceptron does depend on the explicit feature mapping (x), the prediction can be re-expressed and thus depends only on the inner products between nonlinear transformed features.
Show that we do not need to explicitly store w at training or test time. Instead, we can implicitly use it by maintaining all the _i. Please give the outline of the algorithm that would allow us to not store w. You should indicate how _iis initialized, when to update _i, and how it is updated.

3 Kernels

Mercers theorem implies that a bivariate function k(,) is a positive definite kernel function i, for any N and any x₁,x₂, ,x_N, the corresponding kernel matrix K is positive semidefinite, where K_ij= k(x_i,x_j). Recall that a matrix A 2Rⁿⁿis positive semidefinite if all of its eigenvalues are non-negative, or equivalently, if x^>Ax 0 for arbitrary vector x 2Rⁿ^[1].

Suppose k₁(,) and k₂(,) are positive definite kernel functions with corresponding kernel matrices K₁and K₂. Use Mercers theorem to show that the following kernel functions are positive definite.

K₃= a₁K₁+ a₂K₂, for a₁,a₂
K₄defined by k₄(x,x⁰) = f(x)f(x⁰) where f() is an arbitrary real valued function.
K₅defined by k₅(x,x⁰) = k₁(x,x⁰)k₂(x,x⁰).

4 Soft Margin Hyperplanes

The function of the slack variables used in the optimization problem for soft margin hyperplanes has the form:. Instead, we could use, with p > 1.

Give the dual formulation of the problem in this general case.
How does this more general formulation (p > 1) compare to the standard setting (p = 1) discussed in lecture? Is the general formulation more or less complex? Justify your answer.

5 Programming

In this problem, you will experiment with SVMs on a real-world dataset. You will implement a linear SVM (i.e., an SVM using the original features. You will also use a widely used SVM toolbox called LibSVM to experiment with kernel SVMs.

Dataset: We have provided the Splice Dataset from UCIs machine learning data repository.¹The provided binary classification dataset has 60 input features, and the training and test sets contain 1,000 and 2,175 samples, respectively (the files are called splice train.mat and splice test.mat).

5.1 Data preprocessing

Preprocess the training and test data by

computing the mean of each dimension and subtracting it from each dimension
dividing each dimension by its standard deviation

Notice that the mean and standard deviation should be estimated from the training data and then applied to both datasets. Explain why this is the case. Also, report the mean and the standard deviation of the third and 10th features on the test data.

5.2 Implement linear SVM

Please fill in the Matlab functions trainsvm in trainsvm.m and testsvm.m in testsvm.m.

The input of trainsvm contain training feature vectors and labels, as well as the tradeo parameter C. The output of trainsvm contain the SVM parameters (weight vector and bias). In your implementation, you need to solve SVM in its primal form

,b,

s.t. _i,8i

_i0,8i

Please use the quadprog function in Matlab to solve the above quadratic problem.

For testsvm, the input contains testing feature vectors and labels, as well as SVM parameters. The output contains the test accuracy.

5.3 Cross validation for linear SVM

Use 5-fold cross validation to select the optimal C for your implementation of linear SVM.

Report the cross-valiation accuracy (averaged accuracy over each validation set) and average training time (averaged over each training subset) on dierent C taken from {4 ⁶,4 ⁵, ,4,4²}. How does the value of C aect the cross validation accuracy and average training time? Explain your observation.
Which C do you choose based on the cross validation results?
For the selected C, report the test accuracy.

5.4 Use linear SVM in LibSVM

LibSVM is widely used toolbox for SVMs, and it has a Matlab interface. Download LibSVM from http: //www.csie.ntu.edu.tw/~cjlin/libsvm/ and install it according to the README file (make sure to use the Matlab interface provided in the LibSVM toolbox). For each C from {4 ⁶,4 ⁵, ,4,4²}, apply 5-fold cross validation (use -v option in LibSVM) and report the cross validation accuracy and average training time.

Is the cross validation accuracy the same as that in 3? Note that LibSVM solves linear SVM in dual form while your implementation does it in primal form.
How does LibSVM compare with your implementation in terms of training time?

5.5 Use kernel SVM in LibSVM

LibSVM supports a number of kernel types. Here you need to experiment with the polynomial kernel and RBF (Radial Basis Function) kernel.

Polynomial kernel. Please tune C and degree in the kernel. For each combination of (C, degree), where C 2 {4 ³,4 ⁴, ,4⁶,4⁷} and degree 2 {1,2,3}, report the 5-fold cross validation accuracy and average training time.
RBF kernel. Please tune C and gamma in the kernel. For each combination of (C, gamma), where C 2 {4 ³,4 ⁴, ,4⁶,4⁷} and gamma 2 {4 ⁷,4 ⁶, ,4 ¹,4 ²}, report the 5-fold cross validation accuracy and average training time.

Based on the cross validation results of Polynomial and RBF kernel, which kernel type and kernel parameters will you choose? Report the corresponding test accuracy for the configuration with the highest cross validation accuracy.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] Homework #5 CS 260: Machine Learning Algorithms

1 Bias-Variance Tradeo

2 Kernelized Perceptron

3 Kernels

4 Soft Margin Hyperplanes

5 Programming

5.1 Data preprocessing

5.2 Implement linear SVM

5.3 Cross validation for linear SVM

5.4 Use linear SVM in LibSVM

5.5 Use kernel SVM in LibSVM

Reviews

Related products

[Solved] Homework #1 CS 260: Machine Learning Algorithms

[Solved] Homework #4 CS 260: Machine Learning Algorithms

[Solved] Homework #3 CS 260: Machine Learning Algorithms

[Solved] CS 260: Machine Learning-Homework 3-Spring 2020

[Solved] CS 260: Machine Learning-Homework 5-Spring 2020

[Solved] Homework #2 CS 260: Machine Learning Algorithms