- Submit a PDF of your homework, with an appendix listing all your code, to the Gradescope assignment entitled Homework 7 Write-Up. In addition, please include, as your solutions to each coding problem, the specific subset of code relevant to that part of the problem. You may typeset your homework in LaTeX or Word (submit PDF format, not .doc/.docx format) or submit neatly handwritten and scanned solutions. Please start each question on a new page. If there are graphs, include those graphs in the correct sections. Do not put them in an appendix. We need each solution to be self-contained on pages of its own.
- In your write-up, please state with whom you worked on the homework.
- In your write-up, please copy the following statement and sign your signature next to it. (Mac Preview and FoxIt PDF Reader, among others, have tools to let you sign a PDF file.) We want to make it extra clear so that no one inadvertently cheats.
I certify that all solutions are entirely in my own words and that I have not looked at another students solutions. I have given credit to all external sources I consulted.
- Submit all the code needed to reproduce your results to the Gradescope assignment entitled Homework 7 Code. Yes, you must submit your code twice: in your PDF write-up following the directions as described above so the readers can easily read it, and once in compilable/interpretable form so the readers can easily run it. Do NOT include any data files we provided. Please include a short file named README listing your name, student ID, and instructions on how to reproduce your results. Please take care that your code doesnt take up inordinate amounts of time or memory. If your code cannot be executed, your solution cannot be verified.
1 Regularized and Kernel k-Means
Recall that in k-means clustering we attempt to minimize the objective
k
X X
min kxj ik2, where
C1,C2,,Ck 2
i=1 xjCi
i = argminiRd xXjCi kxj ik22 = |C1i| xXjCi xj, i = 1,2,,k.
The samples are {x1,, xn}, where xj Rd. Ci is the set of sample points assigned to cluster i and |Ci| is its cardinality. Each sample point is assigned to exactly one cluster.
- What is the minimum value of the objective when k = n (the number of clusters equals the number of sample points)?
- (Regularized k-means) Suppose we add a regularization term to the above objective. The objective is now
k
Xkik X kxj ik22.
| Show that the optimum of | i=1 | xjCi |
X
miniRd kik xjCi kxj ik22
is obtained at i = |Ci1|+ PxjCi xj.
- Here is an example where we would want to regularize clusters. Suppose there are n students who live in a R2 Euclidean world and who wish to share rides efficiently to Berkeley for their final exam in CS189. The university permits k vehicles which may be used for shuttling students to the exam location. The students need to figure out k good locations to meet up. The students will then walk to the closest meet up point and then the shuttles will deliver them to the exam location. Let xj be the location of student j, and let the exam location be at (0,0). Assume that we can drive as the crow flies, i.e., by taking the shortest path between two points. Write down an appropriate objective function to minimize the total distance that the students and vehicles need to travel. Hint: your result should be similar to the regularized k-means objective.
- (Kernel k-means) Suppose we have a dataset {xi}ni=1, xi R` that we want to split into k clusters, i.e., finding the best k-means clustering without the regularization. Furthermore, suppose we know a priori that this data is best clustered in an impractically high-dimensional feature space Rm with an appropriate metric. Fortunately, instead of having to deal with the (implicit) feature map : R` Rm and (implicit) distance metric[1], we have a kernel function (x1, x2) = (x1) (x2) that we can compute easily on the raw samples. How should we perform the kernelized counterpart of k-means clustering?
Derive the underlined portion of this algorithm, and show your work in deriving it. The main issue is that although we define the means i in the usual way, we cant ever compute explicitly because its way too big. Therefore, in the step where we determine which cluster each sample point is assigned to, we must use the kernel function to obtain the right result. (Review the lecture on kernels if you dont remember how thats done.)
Algorithm 1: Kernel k-means
Require: Data matrix X Rnd; Number of clusters K; kernel function (x1, x2) Ensure: Cluster class class(j) for each sample xj. function Kernel-k-means(X,c)
Randomly initialize class(j) to be an integer in 1,2,, K for each xj. while not converged do
(e) The expression you derived may have unnecessary terms or redundant kernel computations. Explain how to eliminate them; that is, how to perform the computation quickly without doing irrelevant computations or redoing computations already done.
2 Low-Rank Approximation
Low-rank approximation tries to find an approximation to a given matrix, where the approximation matrix has a lower rank compared to the original matrix. This is useful for mathematical modeling and data compression. Mathematically, given a matrix M, we try to find M in the following optimization problem,
argmin kM M kF subject to rank(M ) k (1)
M
q where kAkF = Pi Pj a2ij is the Frobenius norm, i.e., the sum of squares of all entries in the matrix, followed by a square root.
This problem can be solved using Singular Value Decomposition (SVD). Specifically, let M = UV>, where = diag(1,,n). Then a rank-k approximation of M can be written as M = UV>, where = diag(1,,k,0,,0). In this problem, we aim to perform this approximation method on gray-scale images, which can be thought of as a 2D matrix.
- Using the image low-rankdata/face.jpg, perform a rank-5, rank-20, and rank-100 approximation on the image. Show both the original image as well as the low-rank images you obtain in your report.
- Now perform the same rank-5, rank-20, and rank-100 approximation on low-rankdata/sky.jpg. Show both the original image as well as the low-rank images you obtain in your report.
- In one plot, plot the Mean Squared Error (MSE) between the rank-k approximation and the original image for both low-rankdata/face.jpg and low-rankdata/sky.jpg, for k ranging from 1 to 100. Be sure to label each curve in the plot. The MSE between two images I, J Rwh is
MSE(I, J) = X(Ii,j Ji,j)2. (2)
i,j
- Find the lowest-rank approximation for which you begin to have a hard time differentiating the original and the approximated images. Compare your results for the face and the sky image. What are the possible reasons for the difference?
[1] Just as how the interpretation of kernels in kernelized ridge regression involves an implicit prior/regularizer as well as an implicit feature space, we can think of kernels as generally inducing an implicit distance metric as well. Think of how you would represent the squared distance between two points in terms of pairwise inner products and operations on them.

![[Solved] CS 189 Introduction to Machine Learning HW7 Solution](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] CS 189 Introduction to Machine Learning HW1](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.