Name: [Solved] CSE176 HomeWork#2
Brand: Assignment Chef
SKU: [Solved] CSE176 HomeWork#2
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Exercise 1: Euclidean distance classifier (10 points). A Euclidean distance classifier represents each class k = 1,,K by a prototype vector _k R^Dand classifies a pattern x R^Das the class of its closest prototype: k= argmin_k_=1,,Kkx _kk. Prove that a Gaussian classifier with shared isotropic covariances (i.e., of the form _k= ²I for k = 1,,K, where > 0) and equal class priors (i.e., p(C₁) = = p(C_K) = _K¹) is equivalent to a Euclidean distance classifier. Prove the class discriminant functions g₁(x),,g_K(x) are linear and give the expression that defines them.

Exercise 2: bias and variance of an estimator Assume we have a sample X =

{x₁,,x_N} R of N iid (independent identically distributed) scalar random variables, each of which is drawn from a Gaussian distribution N(,²) with R and > 0. We want to estimate the mean of this Gaussian by computing a statistic of the sample X. Consider the following four different statistics of the sample:

₁(X) = 7.
₂(X) = x₁.
3(X) = N1 PNn=1 xn.
₄(X) = x₁x₂.

For each statistic , compute:

Its bias b() = E_X{(X)} .
Its variance var {} = E_X{((X) E_X{(X)})²}.
Its mean square error e(,) = E_X{((X) )²}.

Based on that, answer the following for each estimator (statistic): is it unbiased? is it consistent? Hint: expectations wrt the distribution of the N-point sample X are like this one:

E_X{(X)} = Z (x₁,,x_N)p(x₁,,x_N)dx₁dx_N^iid= Z (x₁,,x_N)p(x₁)p(x_N)dx₁dx_N.

Exercise 3: PCA and LDA Consider 2D data points coming from a mixture of two Gaussians with equal proportions, different means, and equal, diagonal covariances (where ,₁,₂> 0):

x R²: p(x) = ₁p(x|1) + ₂p(x|2) p(x|1) N(₁,₁), p(x|2) N(₂,₂),

, .

Compute the mean and covariance of the mixture distribution p(x).

Hint: let ) for x R^Dbe a mixture of K densities, where ₁,,_K [0,1] and = 1 are the component proportions (prior probabilities) and p(x|k), for k = 1,,K, the component densities (e.g. Gaussian, but not necessarily). Let be the mean and covariance of component density k, for k = 1,,K.

Then, the mean and covariance of the mixture are (you should be able to prove this statement):

Compute the eigenvalues ₁ ₂ 0 and corresponding eigenvectors u₁,u₂ R²of . Can we have ₂> 0?
Find the PCA projection to dimension 1.
Compute the within-class and between-class scatter matrices S_W, S_Bof p.
Compute the eigenvalues ₁ ₂ 0 and corresponding eigenvectors v₁,v₂ R²of S_W¹S_B. Can we have ₂> 0?
Compute the LDA projection.
When does PCA find the same projection as LDA? Give a condition and explain it.

Exercise 4: variations of k-means clustering ). Consider the k-means error function:

N K

E({_k}^K_k₌₁,Z) = XXz_nkkx_n _kk²s.t. Z {0,1}^NK, Z1 = 1

n=1 k=1

over the centroids ₁,,_Kand cluster assignments Z_N_K, given training points x₁,,x_N R^D.

Variation 1: in k-means, the centroids can take any value in R^D: _k R^Dk = 1,,K. Now we want the centroids to take values from among the training points only: _k {x₁,,x_N} k = 1,,K.
1. (8 points) Design a clustering algorithm that minimizes the k-means error function but respecting the above constraint. Your algorithm should converge to a local optimum of the error function. Give the steps of the algorithm explicitly.
2. (2 points) Can you imagine when this algorithm would be useful, or preferable to k-means?
Variation 2: in k-means, we seek K clusters, each characterized y by a centroid _k. Imagine we seek instead K lines (or hyperplanes, in general), each characterized by a weight vector w_k R^Dand bias w_k₀ R, given a supervised dataset (see figure). Data points assigned to line k should have minimum least-squares error Pnline k (yn wkT xn wk0)2. x
1. (8 points) Give an error function that allows us to learn the lines parameters.
2. (12 points) Give an iterative algorithm that minimizes that error function.

Exercise 5: mean-shift algorithm (10 points). Consider a Gaussian kernel density estimate

x RD.

Derive the mean-shift algorithm, which iterates the following expression:

x where

until convergence to a maximum of p (or, in general, a stationary point of p, satisfying p(x) = 0). Hint: take the gradient of p wrt x, equate it to zero and rearrange the resulting expression.

Bonus exercise: nonparametric regression (20 points). Consider the Gaussian kernel smoother

g where

estimated on a training set .

(7 points) What is g(x) if the training set has only one point (N = 1)? Explain.

Sketch the solution in 1D (i.e., when both x_n,y_n R). Compare with using a least-squares linear regression.

(13 points) Prove that, with N = 2 points, we can write g(x) = (x)y₁+ (1 (x))y₂where (x) can be written using the logistic function. Give the detailed expression for (x).

Sketch the solution in 1D.

Compare with using a least-squares linear regression.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSE176 HomeWork#2

Reviews

Related products

[Solved] CSE176 HomeWork#1

[Solved] CSE176 Lab#3 -nonparametric methods

[Solved] CSE176 Lab#2 -Using a Gaussian classifier

[Solved] CSE176 Lab#4 -clustering algorithm

[Solved] CSE176 Lab#5- gradient descent and linear models