[Solved] CSE176 HomeWork#2

$25

File Name: CSE176_HomeWork#2.zip
File Size: 160.14 KB

SKU: [Solved] CSE176 HomeWork#2 Category: Tag:
5/5 - (1 vote)

Exercise 1: Euclidean distance classifier (10 points). A Euclidean distance classifier represents each class k = 1,,K by a prototype vector k RD and classifies a pattern x RD as the class of its closest prototype: k = argmink=1,,K kx kk. Prove that a Gaussian classifier with shared isotropic covariances (i.e., of the form k = 2I for k = 1,,K, where > 0) and equal class priors (i.e., p(C1) = = p(CK) = K1 ) is equivalent to a Euclidean distance classifier. Prove the class discriminant functions g1(x),,gK(x) are linear and give the expression that defines them.

Exercise 2: bias and variance of an estimator Assume we have a sample X =

{x1,,xN} R of N iid (independent identically distributed) scalar random variables, each of which is drawn from a Gaussian distribution N(,2) with R and > 0. We want to estimate the mean of this Gaussian by computing a statistic of the sample X. Consider the following four different statistics of the sample:

  1. 1(X) = 7.
  2. 2(X) = x1.
  3. 3(X) = N1 PNn=1 xn.
  4. 4(X) = x1x2.

For each statistic , compute:

  • Its bias b() = EX {(X)} .
  • Its variance var {} = EX {((X) EX {(X)})2}.
  • Its mean square error e(,) = EX {((X) )2}.

Based on that, answer the following for each estimator (statistic): is it unbiased? is it consistent? Hint: expectations wrt the distribution of the N-point sample X are like this one:

EX {(X)} = Z (x1,,xN)p(x1,,xN)dx1 dxN iid= Z (x1,,xN)p(x1)p(xN)dx1 dxN.

Exercise 3: PCA and LDA Consider 2D data points coming from a mixture of two Gaussians with equal proportions, different means, and equal, diagonal covariances (where ,1,2 > 0):

x R2: p(x) = 1 p(x|1) + 2 p(x|2) p(x|1) N(1,1), p(x|2) N(2,2),

, .

  1. Compute the mean and covariance of the mixture distribution p(x).

Hint: let ) for x RD be a mixture of K densities, where 1,,K [0,1] and = 1 are the component proportions (prior probabilities) and p(x|k), for k = 1,,K, the component densities (e.g. Gaussian, but not necessarily). Let be the mean and covariance of component density k, for k = 1,,K.

Then, the mean and covariance of the mixture are (you should be able to prove this statement):

.

  1. Compute the eigenvalues 1 2 0 and corresponding eigenvectors u1,u2 R2 of . Can we have 2 > 0?
  2. Find the PCA projection to dimension 1.
  3. Compute the within-class and between-class scatter matrices SW , SB of p.
  4. Compute the eigenvalues 1 2 0 and corresponding eigenvectors v1,v2 R2 of SW1SB. Can we have 2 > 0?
  5. Compute the LDA projection.
  6. When does PCA find the same projection as LDA? Give a condition and explain it.

Exercise 4: variations of k-means clustering ). Consider the k-means error function:

N K

E({k}Kk=1,Z) = XXznkkxn kk2 s.t. Z {0,1}NK, Z1 = 1

n=1 k=1

over the centroids 1,,K and cluster assignments ZNK, given training points x1,,xN RD.

  • Variation 1: in k-means, the centroids can take any value in RD: k RD k = 1,,K. Now we want the centroids to take values from among the training points only: k {x1,,xN} k = 1,,K.
    1. (8 points) Design a clustering algorithm that minimizes the k-means error function but respecting the above constraint. Your algorithm should converge to a local optimum of the error function. Give the steps of the algorithm explicitly.
    2. (2 points) Can you imagine when this algorithm would be useful, or preferable to k-means?
  • Variation 2: in k-means, we seek K clusters, each characterized y by a centroid k. Imagine we seek instead K lines (or hyperplanes, in general), each characterized by a weight vector wk RD and bias wk0 R, given a supervised dataset (see figure). Data points assigned to line k should have minimum least-squares error Pnline k (yn wkT xn wk0)2. x
    1. (8 points) Give an error function that allows us to learn the lines parameters.
    2. (12 points) Give an iterative algorithm that minimizes that error function.

Exercise 5: mean-shift algorithm (10 points). Consider a Gaussian kernel density estimate

x RD.

Derive the mean-shift algorithm, which iterates the following expression:

x where

until convergence to a maximum of p (or, in general, a stationary point of p, satisfying p(x) = 0). Hint: take the gradient of p wrt x, equate it to zero and rearrange the resulting expression.

Bonus exercise: nonparametric regression (20 points). Consider the Gaussian kernel smoother

g where

estimated on a training set .

  1. (7 points) What is g(x) if the training set has only one point (N = 1)? Explain.

Sketch the solution in 1D (i.e., when both xn,yn R). Compare with using a least-squares linear regression.

  1. (13 points) Prove that, with N = 2 points, we can write g(x) = (x)y1 + (1 (x))y2 where (x) can be written using the logistic function. Give the detailed expression for (x).

Sketch the solution in 1D.

Compare with using a least-squares linear regression.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE176 HomeWork#2
$25