5/5 - (1 vote)

[Estimating parameters with complete and incomplete data] Consider the data generation process for observation pair (a,b) as follows:
- a is the outcome of an independent six-faced (possibly loaded) dice-roll. That is, chance of rolling face 1 is p₁, rolling face 2 is p₂, etc., with a total of six distinct possibilities.
- Given the outcome a, b is drawn independently from a density distributed as q_ae^q^a^b(where q_a> 0).
- List all the parameters of this process. We shall denote the collection of all the parameters as the variable (the parameter vector).
- Suppose we run this process n times independently, and get the sequence:

(a₁,b₁),(a₂,b₂),,(a_n,b_n).

What is the likelihood that this sequence was generated by a specific setting of the parameter vector ?

What is the most likely setting of the parameter vector given the complete observation sequence ? that is, find the Maximum Likelihood Estimate of given the observations.
What is the probability of the partial (incomplete) observation b₁,b₂,,b_ngiven a specific setting of the parameter vector ?
Derive the Expectation Maximization (EM) algirthm to estimate of the parameters given the incomplete observation sequence .

[kernelizing k-means] k-means with Euclidean distance metric assumes that each pair of clusters is linearly separable. This may not be the case. A classic example is where we have two clusters corresponding to data points on two concentric circles in the R².
- Implement Lloyds method for k-means algorithm and show the resulting cluster assignment for the dataset depicted above. Give two more examples of datasets in R², for which optimal k-means setting results in an undesirable clustering. Show the resulting cluster assignment for the two additional example datasets.

Let denote an explicit non-linear feature space mapping in some inner product space. We will show how one can derive an implicit kernel-based version of the Lloyds method for k-means, where we only operate on data as (x_i) (x_j) = K(x_i,x_j).

Let z_ijbe an indicator that is equal to 1 if the (x_i) is currently assigned to the jth cluster and 0 otherwise (1 i n and 1 j k). Show that the jth cluster center c_jcan be written as, where _ijonly depends on z_ij
Given any two data points (x_i) and (x_j), show that the square distance k(x_i) (x_j)k²can be computed using only (linear combinations of) inner products.
Given the results of parts (ii) and (iii), show how to compute the square distance k(x_i) c_jk²using only (linear combinations of) inner products between the data points x₁,,x_n.
From results from parts (ii), (iii) and (iv), propose the algorithm for kernelized version of Lloyds method of finding k-means.
Implement your proposed kernelized k-means method and run it on the three example datasets of part (i). Compare the resulting cluster for various choices of kernels (e.g.

linear kernel, quadratic kernel, rbf kernel).

(submit your datasets and kernelized code on Courseworks to receive full credit)

[Randomized decision rules] Consider a reward-based prediction framework, where given a continuous state-space X and a discreet action-space A, a learning algorithm f : X A decides to perform action a A when it is in state x X. The learner then receives a numerical (possibly negative) reward R(x,a) for performing action a A in state x X. (Note: such reward-based frameworks are common in robotics where the learning agent, i.e., the robot, performs somepossibly randomizedaction a in some situation x and receives reward R(x,a). The goal of the learning agent is to maximize its expected reward.)
- Assume that f is allowed take a randomized action on any state x. That is, in any state x X, f chooses to perform action a with probability P[a|x]. Show that the average (over the states) expected reward received by f, i.e., Ex,f_(x)[R(x,f(x))] is
- Show that the expected reward is maximized by choosing P[a|x] = 1 for the action a associated with the maximal reward R(x,a) (for a given state x). This shows us that there is no benefit in randomizing the best decision rule.
- Can one benefit from randomizing a suboptimal rule? Briefly explain.
[From distances to embeddings] Your friend from overseas is visiting you and asks you the geographical locations of popular US cities on a map. Not having access to a US map, you realize that you cannot provide your friend accurate information. You recall that you have access to the relative distances between nine popular US cities, given by the following distance matrix D:

Distances (D)	BOS	NYC	DC	MIA	CHI	SEA	SF	LA	DEN
BOS	0	206	429	1504	963	2976	3095	2979	1949
NYC	206	0	233	1308	802	2815	2934	2786	1771
DC	429	233	0	1075	671	2684	2799	2631	1616
MIA	1504	1308	1075	0	1329	3273	3053	2687	2037
CHI	963	802	671	1329	0	2013	2142	2054	996
SEA	2976	2815	2684	3273	2013	0	808	1131	1307
SF	3095	2934	2799	3053	2142	808	0	379	1235
LA	2979	2786	2631	2687	2054	1131	379	0	1059
DEN	1949	1771	1616	2037	996	1307	1235	1059	0

Being a machine learning student, you believe that it may be possible to infer the locations of these cities from the distance data. To find an embedding of these nine cities on a two dimensional map, you decide to solve it as an optimization problem as follows.

You associate a two-dimensional variable x_ias the unknown latitude and the longitude value for each of the nine cities (that is, x₁is the lat/lon value for BOS, x₂is the lat/lon value for NYC, etc.). You write down the an (unconstrained) optimization problem

minimize,

i,j

where ^P_i,j(kx_i x_jk D_ij)²denotes the embedding discrepancy function.

What is the derivative of the discrepancy function with respect to a location x_i?
Write a program in your preferred language to find an optimal setting of locations x₁,,x₉. You must submit your code to Courseworks to receive full credit.
Plot the result of the optimization showing the estimated locations of the nine cities.

(here is a sample code to plot the city locations in Matlab)

>> cities={BOS,NYC,DC,MIA,CHI,SEA,SF,LA,DEN};

>> locs = [x1;x2;x3;x4;x5;x6;x7;x8;x9];

>> figure; text(locs(:,1), locs(:,2), cities);

What can you say about your result of the estimated locations compared to the actual geographical locations of these cities?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] COMS4771 HW4

Reviews

Whatsapp Us

[Solved] COMS4771 HW4

Reviews

Related products

[Solved] COMS4771 HW3

[Solved] COMS4771 HW1