Problems
- LFD Problem 1.3 2. Out of textbook 3. LFD Problem 1.7 4. LFD Problem 1.8
CSE417: Introduction to Machine Learning
Problem Set 1
Q1: LFD Problem 1.3
(a)
Since w is the optimal set of weights, for all 1 n N, wT xn must have same sign as yn since a linear separation is achieved.
Therefore, for all 1 n N, yn(wT xn) > 0, which suggests min1nNyn(wT xn) > 0, or > 0
(b)
Given the update rule, w(t + 1) = w(t) + y(t)x(t), transpose both sides and multiply by w:
w(t + 1)T = w(t)T + (y(t)x(t))T
w(t + 1)T w = w(t)T w + (y(t)x(t))T w
w(t + 1)T w = w(t)T w + y(t)(x(t)T w) given y(t) is 1 * 1
w(t + 1)T w = w(t)T w + y(t)(wT x(t)) given wT x(t) is 1*1 so its symmetric
And by the definition of = min1nNyn(wT xn), for all 1 t N, we have rho y(t)(wT x(t).
Then we could conclude that:
w(t + 1)T w w(t)T w +
Base Case: t = 0, since we assumed w(0) = 0, we have:
wT (0)w = 0 0 = 0
Induction: for t = n, given that the inequality is valid:
wT (n)w n
and incorporate the inequality we deduced above:
wT (n + 1)w w(n)T w + wT (n + 1)w n + wT (n + 1)w (n + 1)
so the inequality must also be valid for t = n + 1. Thus proved.
(c)
Given the update rule, w(t+1) = w(t)+y(t)x(t), transpose both sides and multiply by the original equation:
w(t + 1) w(t + 1)T = (w(t) + (y(t)x(t)))(w(t)T + (y(t)x(t))T )
||w(t + 1)||2 = ||w(t)||2 + 2y(t)x(t)w(t)T + y(t)2||x(t)||2
Since for any y(t), we have y(t)2 = 1, and 2y(t)x(t)w(t)T < 0 given y(t) represents a misclassified point, we have:
||w(t + 1)||2 ||w(t)||2 + y(t)2||x(t)||2 ||w(t + 1)||2 ||w(t)||2 + ||x(t)||2
(d)
Proof:
Base case: for t = 0, ||w(t)||2 = 0 0 R2 = 0
Induction:
Given that ||w(t)||2 tR2, we have:
||w(t + 1)||2 ||w(t)||2 + ||x(t 1)||2
||w(t + 1)||2 ||w(t)||2 + R2 ||w(t + 1)||2 ||w(t)||2 + R2
||w(t + 1)||2 ||w(t)||2 + R2
||w(t + 1)||2 tR2 + R2
||w(t + 1)||2 (t + 1)R2
2
(e)
Proof: Using the conclusion in part (b):
given the conclusion in (d)
Since where represents the angle between wT (t) and w, we must have
1, then we could rewrite the inequality to be:
Given That t 0
2
Q2
The first plot is the number of operation for each iteration and the second plot is a histogram of log difference. Plot attached below:
Q3: LFD Problem 1.7
(a)
For = 0.05
P[0|10,0.05] = 0.9510 = 0.5987
1 Coin: P = 0.5987
10 Coins: P = 1 (1 0.9510)10 = 0.9998
1000 Coins: P = 1 (1 0.9510)1000 = 1
1000000 Coins: P = 1 (1 0.9510)1000000 = 1
For = 0.8
P[0,|10,0.8] = 0.210 = 1.024e 7
1 Coin: P = 1.024e-7
10 Coins: P = 1 (1 0.210)10 = 0.000001024
1000 Coins: P = 1 (1 0.210)1000 = .0001024
1000000 Coins: P = 1 (1 0.210)1000000 = 0.09733
(b)
The result should be equivalent to P = 1 ) for each , which produces a step function satisfy the following condition:
Using Hoeffding bound for 2 coins,
Plot attached below:
Q4: LFD Problem 1.8
(a)
By definition, we have that Then for each > 0, we have:
Proved.
(b)
By definition, E[(u )2] = 2, then using the conclusion in part (a), we have:
(c)
By definition, for N iid random variables (u1,u2,un) each with E(ui) = and variance V ar(ui) = 2 for 0 < i n
Letwe have that:
And that:
Then similar to that in part (b):
proved.
Reviews
There are no reviews yet.