5/5 - (1 vote)

Submit a PDF of your homework, with an appendix listing all your code, to the Gradescope assignment entitled Homework 7 Write-Up. In addition, please include, as your solutions to each coding problem, the specific subset of code relevant to that part of the problem. You may typeset your homework in LaTeX or Word (submit PDF format, not .doc/.docx format) or submit neatly handwritten and scanned solutions. Please start each question on a new page. If there are graphs, include those graphs in the correct sections. Do not put them in an appendix. We need each solution to be self-contained on pages of its own.
- In your write-up, please state with whom you worked on the homework.
- In your write-up, please copy the following statement and sign your signature next to it. (Mac Preview and FoxIt PDF Reader, among others, have tools to let you sign a PDF file.) We want to make it extra clear so that no one inadvertently cheats.

I certify that all solutions are entirely in my own words and that I have not looked at another students solutions. I have given credit to all external sources I consulted.

Submit all the code needed to reproduce your results to the Gradescope assignment entitled Homework 7 Code. Yes, you must submit your code twice: in your PDF write-up following the directions as described above so the readers can easily read it, and once in compilable/interpretable form so the readers can easily run it. Do NOT include any data files we provided. Please include a short file named README listing your name, student ID, and instructions on how to reproduce your results. Please take care that your code doesnt take up inordinate amounts of time or memory. If your code cannot be executed, your solution cannot be verified.

1 Regularized and Kernel k-Means

Recall that in k-means clustering we attempt to minimize the objective

X X

min kxj ik2, where

C₁,C2,,Ck 2

i=1 x_jC_i

i = argminiRd xXjCi kxj ik22 = |C1i| xXjCi xj, i = 1,2,,k.

The samples are {x₁,, x_n}, where x_j R^d. C_iis the set of sample points assigned to cluster i and |C_i| is its cardinality. Each sample point is assigned to exactly one cluster.

What is the minimum value of the objective when k = n (the number of clusters equals the number of sample points)?
(Regularized k-means) Suppose we add a regularization term to the above objective. The objective is now

Xkik X kxj ik22.

Show that the optimum of

i=1

xjCi

miniRd k_ik x_jCi kx_j _ik²₂

is obtained at i = |Ci1|+ PxjCi xj.

Here is an example where we would want to regularize clusters. Suppose there are n students who live in a R²Euclidean world and who wish to share rides efficiently to Berkeley for their final exam in CS189. The university permits k vehicles which may be used for shuttling students to the exam location. The students need to figure out k good locations to meet up. The students will then walk to the closest meet up point and then the shuttles will deliver them to the exam location. Let x_jbe the location of student j, and let the exam location be at (0,0). Assume that we can drive as the crow flies, i.e., by taking the shortest path between two points. Write down an appropriate objective function to minimize the total distance that the students and vehicles need to travel. Hint: your result should be similar to the regularized k-means objective.
(Kernel k-means) Suppose we have a dataset {x_i}ⁿ_i₌₁, x_i R^`that we want to split into k clusters, i.e., finding the best k-means clustering without the regularization. Furthermore, suppose we know a priori that this data is best clustered in an impractically high-dimensional feature space R^mwith an appropriate metric. Fortunately, instead of having to deal with the (implicit) feature map : R^` R^mand (implicit) distance metric^[1], we have a kernel function (x₁, x₂) = (x₁) (x₂) that we can compute easily on the raw samples. How should we perform the kernelized counterpart of k-means clustering?

Derive the underlined portion of this algorithm, and show your work in deriving it. The main issue is that although we define the means _iin the usual way, we cant ever compute explicitly because its way too big. Therefore, in the step where we determine which cluster each sample point is assigned to, we must use the kernel function to obtain the right result. (Review the lecture on kernels if you dont remember how thats done.)

Algorithm 1: Kernel k-means

Require: Data matrix X Rⁿ^d; Number of clusters K; kernel function (x₁, x₂) Ensure: Cluster class class(j) for each sample x_j. function Kernel-k-means(X,c)

Randomly initialize class(j) to be an integer in 1,2,, K for each x_j. while not converged do

(e) The expression you derived may have unnecessary terms or redundant kernel computations. Explain how to eliminate them; that is, how to perform the computation quickly without doing irrelevant computations or redoing computations already done.

2 Low-Rank Approximation

Low-rank approximation tries to find an approximation to a given matrix, where the approximation matrix has a lower rank compared to the original matrix. This is useful for mathematical modeling and data compression. Mathematically, given a matrix M, we try to find M in the following optimization problem,

argmin kM M k_Fsubject to rank(M ) k (1)

q where kAk_F= ^P_i^P_ja²_ijis the Frobenius norm, i.e., the sum of squares of all entries in the matrix, followed by a square root.

This problem can be solved using Singular Value Decomposition (SVD). Specifically, let M = UV^>, where = diag(₁,,_n). Then a rank-k approximation of M can be written as M = UV^>, where = diag(₁,,_k,0,,0). In this problem, we aim to perform this approximation method on gray-scale images, which can be thought of as a 2D matrix.

Using the image low-rankdata/face.jpg, perform a rank-5, rank-20, and rank-100 approximation on the image. Show both the original image as well as the low-rank images you obtain in your report.
Now perform the same rank-5, rank-20, and rank-100 approximation on low-rankdata/sky.jpg. Show both the original image as well as the low-rank images you obtain in your report.
In one plot, plot the Mean Squared Error (MSE) between the rank-k approximation and the original image for both low-rankdata/face.jpg and low-rankdata/sky.jpg, for k ranging from 1 to 100. Be sure to label each curve in the plot. The MSE between two images I, J R^w^his

MSE(I, J) = ^X(I_i,_j J_i,_j)². (2)

i,j

Find the lowest-rank approximation for which you begin to have a hard time differentiating the original and the approximated images. Compare your results for the face and the sky image. What are the possible reasons for the difference?

[1] Just as how the interpretation of kernels in kernelized ridge regression involves an implicit prior/regularizer as well as an implicit feature space, we can think of kernels as generally inducing an implicit distance metric as well. Think of how you would represent the squared distance between two points in terms of pairwise inner products and operations on them.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CS 189 Introduction to Machine Learning HW7 Solution

1 Regularized and Kernel k-Means

2 Low-Rank Approximation

Reviews

Whatsapp Us

[Solved] CS 189 Introduction to Machine Learning HW7 Solution

1 Regularized and Kernel k-Means

2 Low-Rank Approximation

Reviews

Related products

[Solved] CS 189 Introduction to Machine Learning HW4

[Solved] CS189 Introduction To Machine Learning HW6

[Solved] CS189 Homework4- Wine Classification main

[Solved] CS 189 Introduction to Machine Learning HW3

[Solved] CS189 Homework3-LDA-QDA-main

[Solved] CS 189 Introduction to Machine Learning HW1