[SOLVED] CSC 482A/581A Problem Set 2

$25

File Name: CSC_482A/581A_Problem_Set_2.zip
File Size: 254.34 KB

5/5 - (1 vote)

Machine Learning Theory (CSC 482A/581A)

Problem Set 2

Instructions:

  • You must write up your solutions individually.
  • You may have high-level discussions with up to 2 other students registered in the course. If you discuss problems with others, include at the top of your submission: their names, V#s, and the problems discussed.
  • Your solutions should be clear and concise (and legible!). You are strongly encouraged to type up your solutions in LaTeX. For any problems where you only have a partial solution, be clear about any parts of your solution for which you have low confidence.
  • If submitted through conneX, the due date is Friday, February 23rd, 7pm. This is a hard deadline. If submitted in writing, submit at the beginning of class (12:30pm) Friday, February 23rd. You are thus incentivized to submit online (using LaTeX) as you get more time!

    Questions:

  1. Let X = R2 and take F to be the set of all convex polygons; the classifier corresponding to a convex polygon labels as positive all points inside the polygon (including the boundary) and labels all other points as negative. Prove that VCdim(F) = .

    d

  2. Let F be the class of linear separators in d dimensions, so that F = fw,b :wR ,bR

    with fw,b(x) = 1 [w, x + b 0].

    1. (a) Prove that VCdim(F ) d + 1.
    2. (b) Radons Theorem states that any set of d + 2 points in Rd can be partitioned into two sets A and B such that the convex hulls of A and B intersect. Using Radons Theorem, prove that VCdim(F ) d + 1 (and hence, combined with part (a), we may conclude that VCdim(F) = d + 1).
    3. (c) Next, prove Radons Theorem. Any valid proof is allowed. Here is the start of one potential proof. Recall from linear algebra that any d + 1 points x1, . . . , xd+1 Rd must be linearly dependent, i.e., there exists a vector Rd+1 not equal to the zero vector such that

      d+1
      jxj =0. j=1

      The hint is to first prove that any set of d + 2 points x1,,xd+2 Rd must be affine independent, meaning that there exists a vector Rd+2 not equal to the zero vector such that

d+1
jxj =0 j=1

and

d+2
j =0. j=1

1

3. Suppose that P is a probability distribution over Rd, and let the training sample X1, . . . , Xn, Xn+1 be i.i.d. samples with distribution P . We say that (X1, . . . , Xn) is s-sparse if

Xj0 s forallj[n],
where, for any vector x, the l0 norm x0 is defined as the number of non-zero components

in x.
Let s be the minimum value of s {0,1,,d} such that (X1,,Xn) is s-sparse.

(a) Derive an upper bound (which holds with high probability over X1, . . . , Xn) on the prob- ability that Xn+1 is s-sparse. Specifically, your bound should be of the form:

log 1

With probability at least 1 , Pr(Xn+10 > s) = O worse than O 1 (so O logn is not allowed).

.
The bound can also depend on the dimension, but the rate with respect to n cannot be

n

(b) If your upper bound from part (a) depended on the dimension d, it degrades severely as

d . Derive an upper bound that is dimension-free. Unlike part (a), the rate with

respect to n now can be O logn . n

4. Suppose that P is a probability distribution over the unit Euclidean ball in Rd, and let X1, . . . , Xn be i.i.d. samples with distribution P .

Using tools from class, prove that the average distance (considering all pairs) between n points is tightly concentrated around its expectation. That is, show that

nn

EX,YP XY2. Specifically, you should show that with probability at least 1 ,

1 log

1

XXE
i j2 X,YP

XY =O
2 n

.

1

2 1i<j n

n Xi Xj2

is tightly concentrated around

n
2 1i<jn

2

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CSC 482A/581A Problem Set 2
$25