- Maximum Likelihood Method: consider n random samples from a multivariate normal distribution, Xi Rp N(,) with i = 1,,n. (a) Show the log-likelihood function
trace(
where, and some constant C does not depend on and
;
- Show that f(X) = trace(AX1) with 0 has a first-order approximation,
f(X + ) f(X) trace(X1A0X1)
hence formally df(X)/dX = X1AX1 (note (I + X)1 I X);
- Show that g(X) = logdet(X) with 0 has a first-order approximation,
g(X + ) g(X) + trace(X1)
hence dg(X)/dX = X1 (note: consider eigenvalues of X1/2X1/2);
- Use these formal derivatives with respect to positive semi-definite matrix variables toshow that the maximum likelihood estimator of is
.
A reference for (b) and (c) can be found in Convex Optimization, by Boyd and Vandenbergh, examples in Appendix A.4.1 and A.4.3:
https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf 2. Shrinkage: Suppose y N(,Ip).
1
Homework 3. MLE and James-Stein Estimator 2
- Consider the Ridge regression
.
Show that the solution is given by
.
Compute the risk (mean square error) of this estimator. The risk of MLE is given when
C = I.
- Consider the LASSO problem,
.
Show that the solution is given by Soft-Thresholding
softi = soft(yi;) := sign(yi)(|yi| )+.
For the choice = 2logp, show that the risk is bounded by
p
Eksoft(y) k2 1 + (2logp + 1)Xmin(2i,1).
i=1
Under what conditions on , such a risk is smaller than that of MLE? Note: see Gaussian Estimation by Iain Johnstone, Lemma 2.9 and the reasoning before it.
- Consider the l0 regularization
,
where = 0). Show that the solution is given by Hard-Thresholding
hardi = hard(yi;) := yiI(|yi| > ).
Rewriting hard(y) = (1 g(y))y, is g(y) weakly differentiable? Why?
- Consider the James-Stein Estimator
Show that the risk is
EkJS(y) k2 = EU(y)
where U(y) = p(2(p2)2)/kyk2. Find the optimal = argmin U(y). Show that for p > 2, the risk of James-Stein Estimator is smaller than that of MLE for all Rp.
Homework 3. MLE and James-Stein Estimator 3
- In general, an odd monotone unbounded function : R R defined by (t) with parameter 0 is called shrinkage rule, if it satisfies
[shrinkage] 0 (|t|) |t|;
[odd] (t) = (t);
[monotone] (t) (t0) for t t0;
[unbounded] limt (t) = .
Which rules above are shrinkage rules?
- Necessary Condition for Admissibility of Linear Estimators. Consider linear estimator for y N(,2Ip) C(y) =
Show that C is admissible only if
- C is symmetric;
- 0 i(C) 1 (where i(C) are eigenvalues of C); (c) i(C) = 1 for at most two i.
These conditions are satisfied for MLE estimator when p = 1 and p = 2.
Reference: Theorem 2.3 in Gaussian Estimation by Iain Johnstone, http://statweb.stanford.edu/~imj/Book100611.pdf
- *James Stein Estimator for p = 1,2 and upper bound: If we use SURE to calculate the risk of James Stein Estimator,
it seems that for p = 1 James Stein Estimator should still have lower risk than MLE for any . Can you find what will happen for p = 1 and p = 2 cases?
Moreover, can you derive the upper bound for the risk of James-Stein Estimator?
.
Reviews
There are no reviews yet.