1 Multiple Choice Questions
- true/false We are machine learners with a slight gambling problem (very different from gamblers with a machine learning problem!). Our friend, Bob, is proposing the following payout on the roll of a dice:
payout (1)
where x {1,2,3,4,5,6} is the outcome of the roll, (+) means payout to us and () means payout to Bob. Is this a good bet i.e are we expected to make money?
True False
- (1 point) X is a continuous random variable with the probability density function:
(2)
Which of the following statements are true about equation for the corresponding cumulative density function (cdf) C(x)?
[Hint: Recall that CDF is defined as C(x) = Pr(X x).]
All of the above
None of the above
- (2 point) A random variable x in standard normal distribution has following probability density
(3)
Evaluate following integral
(4)
[Hint: We are not sadistic (okay, were a little sadistic, but not for this question). This is not a calculus question.]
a + b + c c a + c b + c
- (2 points) Consider the following function of x = (x1,x2,x3,x4,x5,x6):
(5)
where is the sigmoid function
(6)
Compute the gradient xf() and evaluate it at at x = (5,1,6,12,7,5).
- (2 points) Which of the following functions are convex?
x for x Rn for w Rd
All of the above
- (2 points) Suppose you want to predict an unknown value Y R, but you are only given a sequence of noisy observations x1xn of Y with i.i.d. noise ().. If we assume the noise is I.I.D. Gaussian (), the maximum likelihood estimate (y) for Y can be given by:
= argmin
= argmin
Both A & C
Both B & C
2 Proofs
- Prove that
loge x x 1, x > 0 (7)
with equality if and only if x = 1.
[Hint: Consider differentiation of log(x) (x 1) and think about concavity/convexity and second derivatives.]
- (6 points) Consider two discrete probability distributions p and q over k outcomes:
k k X X
pi = qi = 1 (8a)
i=1 i=1
pi > 0,qi > 0, i {1,,k} (8b)
The Kullback-Leibler (KL) divergence (also known as the relative entropy) between these distributions is given by:
(9) It is common to refer to KL(p,q) as a measure of distance (even though it is not a proper metric). Many algorithms in machine learning are based on minimizing KL divergence between two probability distributions. In this question, we will show why this might be a sensible thing to do.
[Hint: This question doesnt require you to know anything more than the definition of KL(p,q) and the identity in Q7]
- Using the results from Q7, show that KL(p,q) is always non-negative.
- When is KL(p,q) = 0?
- Provide a counterexample to show that the KL divergence is not a symmetric function of its arguments: KL(p,q) 6= KL(q,p)
- (6 points) In this question, you will prove that cross-entropy loss for a softmax classifier is convex in the model parameters, thus gradient descent is guaranteed to find the optimal parameters. Formally, consider a single training example (x,y). Simplifying the notation slightly from the implementation writeup, let
z = Wx + b, (10)
(11)
(12)
Prove that L() is convex in W.
[Hint: One way of solving this problem is brute force with first principles and Hessians.
There are more elegant solutions.]
Reviews
There are no reviews yet.