[Solved] COMS4771 HW4

$25

File Name: COMS4771_HW4.zip
File Size: 113.04 KB

SKU: [Solved] COMS4771 HW4 Category: Tag:
5/5 - (1 vote)
  • [Estimating parameters with complete and incomplete data] Consider the data generation process for observation pair (a,b) as follows:
    • a is the outcome of an independent six-faced (possibly loaded) dice-roll. That is, chance of rolling face 1 is p1, rolling face 2 is p2, etc., with a total of six distinct possibilities.
    • Given the outcome a, b is drawn independently from a density distributed as qaeqab (where qa > 0).
    • List all the parameters of this process. We shall denote the collection of all the parameters as the variable (the parameter vector).
    • Suppose we run this process n times independently, and get the sequence:

(a1,b1),(a2,b2),,(an,bn).

What is the likelihood that this sequence was generated by a specific setting of the parameter vector ?

  • What is the most likely setting of the parameter vector given the complete observation sequence ? that is, find the Maximum Likelihood Estimate of given the observations.
  • What is the probability of the partial (incomplete) observation b1,b2,,bn given a specific setting of the parameter vector ?
  • Derive the Expectation Maximization (EM) algirthm to estimate of the parameters given the incomplete observation sequence .
  • [kernelizing k-means] k-means with Euclidean distance metric assumes that each pair of clusters is linearly separable. This may not be the case. A classic example is where we have two clusters corresponding to data points on two concentric circles in the R2.
    • Implement Lloyds method for k-means algorithm and show the resulting cluster assignment for the dataset depicted above. Give two more examples of datasets in R2, for which optimal k-means setting results in an undesirable clustering. Show the resulting cluster assignment for the two additional example datasets.

Let denote an explicit non-linear feature space mapping in some inner product space. We will show how one can derive an implicit kernel-based version of the Lloyds method for k-means, where we only operate on data as (xi) (xj) = K(xi,xj).

  • Let zij be an indicator that is equal to 1 if the (xi) is currently assigned to the jth cluster and 0 otherwise (1 i n and 1 j k). Show that the jth cluster center cj can be written as, where ij only depends on zij
  • Given any two data points (xi) and (xj), show that the square distance k(xi) (xj)k2 can be computed using only (linear combinations of) inner products.
  • Given the results of parts (ii) and (iii), show how to compute the square distance k(xi) cjk2 using only (linear combinations of) inner products between the data points x1,,xn.
  • From results from parts (ii), (iii) and (iv), propose the algorithm for kernelized version of Lloyds method of finding k-means.
  • Implement your proposed kernelized k-means method and run it on the three example datasets of part (i). Compare the resulting cluster for various choices of kernels (e.g.

linear kernel, quadratic kernel, rbf kernel).

(submit your datasets and kernelized code on Courseworks to receive full credit)

  • [Randomized decision rules] Consider a reward-based prediction framework, where given a continuous state-space X and a discreet action-space A, a learning algorithm f : X A decides to perform action a A when it is in state x X. The learner then receives a numerical (possibly negative) reward R(x,a) for performing action a A in state x X. (Note: such reward-based frameworks are common in robotics where the learning agent, i.e., the robot, performs somepossibly randomizedaction a in some situation x and receives reward R(x,a). The goal of the learning agent is to maximize its expected reward.)
    • Assume that f is allowed take a randomized action on any state x. That is, in any state x X, f chooses to perform action a with probability P[a|x]. Show that the average (over the states) expected reward received by f, i.e., Ex,f(x)[R(x,f(x))] is
    • Show that the expected reward is maximized by choosing P[a|x] = 1 for the action a associated with the maximal reward R(x,a) (for a given state x). This shows us that there is no benefit in randomizing the best decision rule.
    • Can one benefit from randomizing a suboptimal rule? Briefly explain.
  • [From distances to embeddings] Your friend from overseas is visiting you and asks you the geographical locations of popular US cities on a map. Not having access to a US map, you realize that you cannot provide your friend accurate information. You recall that you have access to the relative distances between nine popular US cities, given by the following distance matrix D:
Distances (D) BOS NYC DC MIA CHI SEA SF LA DEN
BOS 0 206 429 1504 963 2976 3095 2979 1949
NYC 206 0 233 1308 802 2815 2934 2786 1771
DC 429 233 0 1075 671 2684 2799 2631 1616
MIA 1504 1308 1075 0 1329 3273 3053 2687 2037
CHI 963 802 671 1329 0 2013 2142 2054 996
SEA 2976 2815 2684 3273 2013 0 808 1131 1307
SF 3095 2934 2799 3053 2142 808 0 379 1235
LA 2979 2786 2631 2687 2054 1131 379 0 1059
DEN 1949 1771 1616 2037 996 1307 1235 1059 0

Being a machine learning student, you believe that it may be possible to infer the locations of these cities from the distance data. To find an embedding of these nine cities on a two dimensional map, you decide to solve it as an optimization problem as follows.

You associate a two-dimensional variable xi as the unknown latitude and the longitude value for each of the nine cities (that is, x1 is the lat/lon value for BOS, x2 is the lat/lon value for NYC, etc.). You write down the an (unconstrained) optimization problem

minimize,

i,j

where Pi,j(kxi xjk Dij)2 denotes the embedding discrepancy function.

  • What is the derivative of the discrepancy function with respect to a location xi?
  • Write a program in your preferred language to find an optimal setting of locations x1,,x9. You must submit your code to Courseworks to receive full credit.
  • Plot the result of the optimization showing the estimated locations of the nine cities.

(here is a sample code to plot the city locations in Matlab)

>> cities={BOS,NYC,DC,MIA,CHI,SEA,SF,LA,DEN};

>> locs = [x1;x2;x3;x4;x5;x6;x7;x8;x9];

>> figure; text(locs(:,1), locs(:,2), cities);

What can you say about your result of the estimated locations compared to the actual geographical locations of these cities?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] COMS4771 HW4
$25