Required reading: Wright and Recht, Ch. 5.
Throughout the problem set, k · k stands for the 2-norm k · k2.
1 (Exercise 4(b) from Wright and Recht, Ch. 5) Consider the general additive Gaussian noise model
g(x;ξ) = ∇f(x) + ξ,
where ξ is an n-dimensional Gaussian random vector with mean 0 and covariance matrix σ2In. Suppose
we estimate the gradient ∇f(x) using a minibatch of size s ≥ 1:
,
where ξ1,…,ξs are i.i.d. Gaussian random vectors as above. Show that 2 E.
2 (Exercise 6 from Wright and Recht, Ch. 5) Let f : Rn → R be an m-strongly convex function that can be expressed as an expectation of the form f(x) = Eξ[F(x;ξ)]. Assume that there exist constants Lg > 0 and B ≥ 0, such that
Ek∇F(x;ξ)k2 ≤ L2gkx − x∗k2 + B2, for all x ∈ Rn
where x∗ is the unique global minimizer of f. Suppose we run the stochastic gradient method on f by
sampling ξ and taking steps along ∇F(x;ξ) using an epoch-doubling approach. That is, we run for T steps with steplength α, then for 2T steps with steplength α/2, then for 4T steps with steplength α/4, and so on. Let xˆt be the average of all the iterates in the tth epoch. How many epochs are required to guarantee that Ekxˆt − x∗k2 ≤ ε?
3 Consider the finite-sum objective
A simple mini-batching strategy is to sample p ≥ 1 indices i1,…,ip ∈ {1,…,N} uniformly at random with replacement and estimate the gradient ∇f(x) by
,
where ξ := (i1,…,ip). Prove that, for each x ∈ Rn,
Eξ[g(x;ξ)] = ∇f(x)
and
N
E.
=1
1
4 Consider the finite-sum objective
satisfying the following:
• The functions fi are all nonnegative and L-smooth, and there exists a point x∗ such that fi(x∗) = 0 for all i.
• There exists a constant m > 0, such that
k∇f(x)k2 ≥ 2mf(x), for all x ∈ Rn.
That is, f satisfies the Polyak–Łojasiewicz (PL) condition, see Section 3.8 of Wright and Recht.
Suppose that we run the following stochastic gradient method: Starting with an arbitrary initialization x0 ∈ Rn, at each iteration k = 0,1,2,… we generate
xk+1 = xk − αg(xk;ξk),
where α > 0 is a constant steplength and where g(xk;ξk) is an estimate of ∇f(xk) using a mini-batch of size
p, as in Problem 3.
(a) Prove that, for each k, E
.
Hint: Use smoothness and the result of Problem 3.
(b) Assume that α ≤ L2. Prove that, under our assumptions on f, it follows from the result in (a) that E.
Hint: Since each fi is L-smooth, k∇fi(x)k2 ≤ 2Lfi(x) for all x ∈ Rn (recall Problem 4(a) in Homework 1).
(c) Starting from the result in (b), show that
E
(d) By optimizing over the choice of α in part (c), prove that our stochastic gradient method converges at an exponential rate:
E[f(xk)] ≤ (1 − mα∗)kE[f(x0)],
where.
2

![[SOLVED] Ece490 homework 3 p0](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[SOLVED] Cop 2220 assignment 1](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.