Exercise 1: Euclidean distance classifier (10 points). A Euclidean distance classifier represents each class k = 1,,K by a prototype vector k RD and classifies a pattern x RD as the class of its closest prototype: k = argmink=1,,K kx kk. Prove that a Gaussian classifier with shared isotropic covariances (i.e., of the form k = 2I for k = 1,,K, where > 0) and equal class priors (i.e., p(C1) = = p(CK) = K1 ) is equivalent to a Euclidean distance classifier. Prove the class discriminant functions g1(x),,gK(x) are linear and give the expression that defines them.
Exercise 2: bias and variance of an estimator Assume we have a sample X =
{x1,,xN} R of N iid (independent identically distributed) scalar random variables, each of which is drawn from a Gaussian distribution N(,2) with R and > 0. We want to estimate the mean of this Gaussian by computing a statistic of the sample X. Consider the following four different statistics of the sample:
- 1(X) = 7.
- 2(X) = x1.
- 3(X) = N1 PNn=1 xn.
- 4(X) = x1x2.
For each statistic , compute:
- Its bias b() = EX {(X)} .
- Its variance var {} = EX {((X) EX {(X)})2}.
- Its mean square error e(,) = EX {((X) )2}.
Based on that, answer the following for each estimator (statistic): is it unbiased? is it consistent? Hint: expectations wrt the distribution of the N-point sample X are like this one:
EX {(X)} = Z (x1,,xN)p(x1,,xN)dx1 dxN iid= Z (x1,,xN)p(x1)p(xN)dx1 dxN.
Exercise 3: PCA and LDA Consider 2D data points coming from a mixture of two Gaussians with equal proportions, different means, and equal, diagonal covariances (where ,1,2 > 0):
x R2: p(x) = 1 p(x|1) + 2 p(x|2) p(x|1) N(1,1), p(x|2) N(2,2),
, .
- Compute the mean and covariance of the mixture distribution p(x).
Hint: let ) for x RD be a mixture of K densities, where 1,,K [0,1] and = 1 are the component proportions (prior probabilities) and p(x|k), for k = 1,,K, the component densities (e.g. Gaussian, but not necessarily). Let be the mean and covariance of component density k, for k = 1,,K.
Then, the mean and covariance of the mixture are (you should be able to prove this statement):
.
- Compute the eigenvalues 1 2 0 and corresponding eigenvectors u1,u2 R2 of . Can we have 2 > 0?
- Find the PCA projection to dimension 1.
- Compute the within-class and between-class scatter matrices SW , SB of p.
- Compute the eigenvalues 1 2 0 and corresponding eigenvectors v1,v2 R2 of SW1SB. Can we have 2 > 0?
- Compute the LDA projection.
- When does PCA find the same projection as LDA? Give a condition and explain it.
Exercise 4: variations of k-means clustering ). Consider the k-means error function:
N K
E({k}Kk=1,Z) = XXznkkxn kk2 s.t. Z {0,1}NK, Z1 = 1
n=1 k=1
over the centroids 1,,K and cluster assignments ZNK, given training points x1,,xN RD.
- Variation 1: in k-means, the centroids can take any value in RD: k RD k = 1,,K. Now we want the centroids to take values from among the training points only: k {x1,,xN} k = 1,,K.
- (8 points) Design a clustering algorithm that minimizes the k-means error function but respecting the above constraint. Your algorithm should converge to a local optimum of the error function. Give the steps of the algorithm explicitly.
- (2 points) Can you imagine when this algorithm would be useful, or preferable to k-means?
- Variation 2: in k-means, we seek K clusters, each characterized y by a centroid k. Imagine we seek instead K lines (or hyperplanes, in general), each characterized by a weight vector wk RD and bias wk0 R, given a supervised dataset (see figure). Data points assigned to line k should have minimum least-squares error Pnline k (yn wkT xn wk0)2. x
- (8 points) Give an error function that allows us to learn the lines parameters.
- (12 points) Give an iterative algorithm that minimizes that error function.
Exercise 5: mean-shift algorithm (10 points). Consider a Gaussian kernel density estimate
x RD.
Derive the mean-shift algorithm, which iterates the following expression:
x where
until convergence to a maximum of p (or, in general, a stationary point of p, satisfying p(x) = 0). Hint: take the gradient of p wrt x, equate it to zero and rearrange the resulting expression.
Bonus exercise: nonparametric regression (20 points). Consider the Gaussian kernel smoother
g where
estimated on a training set .
- (7 points) What is g(x) if the training set has only one point (N = 1)? Explain.
Sketch the solution in 1D (i.e., when both xn,yn R). Compare with using a least-squares linear regression.
- (13 points) Prove that, with N = 2 points, we can write g(x) = (x)y1 + (1 (x))y2 where (x) can be written using the logistic function. Give the detailed expression for (x).
Sketch the solution in 1D.
Compare with using a least-squares linear regression.
Reviews
There are no reviews yet.