Problem 1: We would like to test the ability of the first kernel method (discussed in the class) to approximate pdfs. Therefore we generate 1000 realizations of a random variable uniformly distributed in [0,1]. a) Approximate the corresponding pdf using the Gaussian kernel
K.
Plot the resulting approximations for different values of h. Do not limit your graph in [0,1] but extend it beyond the two boundaries. What do you observe as far as capturing the support of the random variable (the interval [0,1]) and the constant value of the pdf (equal to 1 in [0,1]) is concerned? b) Complete the same steps for the Laplacian kernel
K.
Problem 2: The Matlab data file hw4-2data.mat contains two matrices: stars and circles each being a list of 2 dimensional (2-D) vectors. Each 2-D vector identifies a point in the 2-D space which is labeled either as star or circle. We are interested in developing a classifier that distinguishes between the two sets. We would like to use the kernel method to find a nonlinear separating boundary. To achieve this we assign the label 1 to stars and the label 1 to circles and if (X),X = [x1,x2]| is the transformation we would like to apply to the data then we want to solve the following minimization problem to find the optimum (X)
, (1)
where V is the vector space of functions generated by the Gaussian kernel
K(X,Y ) = eh1kXY k2 = eh1{(x1y1)2+(x2y2)2}.
- Use the Representer theorem to prove that for the first two sums we can replace (X) by its orthogonal projection (X) onto the linear subspace generated by K(X,Xi),Xi stars and K(X,Xj),Xj circles where
(X) = X iK(X,Xi) + X jK(X,Xj). (2)
Xistars Xjcircles
- For the term k(X)k2 use the orthogonality principle to show that
k(X)k2 = k(X)k2 + k(X) (X)k2 k(X)k2.
Does this suggest that we can replace (X) with (X) everywhere in the original cost function in (1)? If yes, then WHY? c) Using the form of (X) defined in (2), find the optimum coefficients i,j. d) Once you identify the optimum (X) explain how you are going to use it to classify a new point Xnew as star or
circle given, of course, that (Xnew) will not be exactly equal to 1 or 1. e) After you have specified your final classification rule in d) find (numerically) the separating boundary for the two classes in the 2-D space (also place the training points on the 2-D plane to verify the quality of your boundary). Repeat the process for different values of h and .
Problem 3: Suppose that the scalar random variable y and the random vector X are related through the following model
y = |X + w, (3)
where w is a scalar random variable with mean 0 and independent from X and a deterministic vector. Assuming that we have knowledge of the statistical behavior of the random pair (y,X) we would like to estimate by solving the following minimization problem
minE[(y |X)2]. (4)
- a) Find the optimum and prove that it is equal to . b) Assume now that the statistical behavior of the pair (y,X) is not available and, instead, you are observing pairs (yt,Xt) that are independent realizations of (y,X). Find the learning (adaptive) algorithm that provides a new estimate t every time a new pair (yt,Xt) becomes available. The resulting algorithm is known as Least Mean Squares (LMS) and constitutes the most well known algorithm in Adaptive Signal Processing. c) Use your own favorite , of length 5, generate pairs (yt,Xt) using the model in (3), where Xt is a Gaussian vector of length 5 with independent and identically distributed elements of mean 0 and variance 1 and wt is again Gaussian independent from Xt with mean 0 and variance 0.1. Apply your algorithm from b) and verify its convergence towards . Select a small learning rate so that your estimates are not very noisy. Plot the squared norm kt k2 as a function of the iteration t and examine whether it converges to something small. Use logarithmic scale (semilogy in Matlab) to be able to observe differences between small quantities. d) Repeat the previous simulation but with a learning rate which is half the learning rate you used in question c). What do you observe as far as convergence rate and steady state error is concerned? Can you compare the performance in the two cases and claim that one is better than the other?
Reviews
There are no reviews yet.