, , , , ,

[SOLVED] Ee-556 homework exercise-2 (for lecture 7)

$25

File Name: Ee_556_homework_exercise_2__for_lecture_7_.zip
File Size: 395.64 KB

5/5 - (1 vote)

1 Minimax problems and GANs
Consider the function f : R
2 → R where f(x, y) = xy
1. (10 points) Find the first order stationary points, and classify them as local minimum, local maximum or saddle point according
to the Hessian.
2. (10 points) Show that (x
?
, y
?
) = (0, 0) is a solution to the minimax problem minx maxy
f(x, y). This means that f(x
?
, y
?
) ≥ f(x
?
, y)
and f(x
?
, y
?
) ≤ f(x, y
?
), for all x, y.
3. (20 points) One possible attempt at finding this solution via iterative first-order methods is to perform gradient updates on the
variables x and y. More precisely for γ > 0 consider the gradient descent/ascent updates
xk+1 = xk − γ∇x
f(xk
, yk), yk+1 = yk + γ∇y
f(xk
, yk)
Show that the sequence of iterates {xk
, yk}

k=0
starting from any point (x0, y0) , (0, 0) diverges, for any γ > 0. Find the rate at which
its distance to the origin grows.
In the context of GANs, suppose the true distribution is a multivariate normal on R
2
(with mean and covariance matrix as specified
in the code), and the noise distribution is a standard normal on R
2
. Your generator and dual variable classes are defined as
G := {g : g(z) = Wz + b}, F := {f : f(x) = v
T
x} (1)
For a matrix W ∈ R
2×2 and vectors b, v ∈ R
2
.
1. (10 points) Suppose the space R
2
is equipped with the `2-norm k(x, y)k
2
2 = x
2 + y
2
. For any f ∈ F compute its Lipschitz constant
with respect to this norm. Describe the set of functions in F whose Lipschitz constant is at most 1.
2. (10 points) Implement a function enforcing the 1-Lipschitz constraint of the dual variable and the generator and dual variable functions (in variables.py). Then implement an stochastic estimate of the objective function of the minimax game (in
trainer.py):
min
g∈G
max
f ∈F
E[f(X) − f(g(Z)))] (2)
where X has the true distribution, and Z has the noise distribution. In order to implement the stochastic estimate you will use
samples from such distributions. Use the methods available for the class torch.distributions.Distribution from the
PyTorch module.
Remark: The enforce_lipschitz method should not return any value. It should modify the parameters of f inplace. In order
to do this we recommend directly modifying any tensor’s values by rewriting their data attribute e.g., x.data = x.data +
1. This avoids PyTorch’s automatic tracking of operations for automatic differentiation, which might cause issues.
LIONS @ EPFL Prof. Volkan Cevher
3. (25 points) complete the missing functions in trainer.py, You should implement alternating and simultaneous stochastic
gradient ascent/descent updates. More specifically
fk+1 = fk + γSGf(fk
, gk), gk+1 = gk − γSGg(fk
, gk) (simultaneous) (3)
fk+1 = fk + γSGf(fk
, gk), gk+1 = gk − γSGg(fk+1, gk) (alternating) (4)
Where SG is the stochastic gradient oracle.
in the file optim.py we provide a modification of PyTorch’s SGD optimizer, which allows for negative learning rates (stepsize).
This is only a trick to be able to perform stochastic gradient descent for the generator, and stochastic gradient ascent for the dual
variable. PyTorch’s Optimizer.step() method always performs a step in the direction of the negative gradient; by allowing
both positive and negative learning rates we can effectively switch between gradient descent or ascent.
Run both methods using the script train.py passing the option –training_mode simultaneous or –training_mode
alternating. Include the generated plots in your report. Comment on your findings.
4. (10 points) Show that given two distributions µ and ν, if F is the class of functions defined in (1), it holds that:
max
f ∈F
EX∼µ[f(X)] − EY∼ν[f[Y]] ≥ 0 (5)
5. (15 points) Show that given two distributions µ and ν on R
2
, if their first moments coincide i.e.,
E(x1,x2)∼µ[x1] = E(x1,x2)∼ν[x1], E(x1,x2)∼µ[x2] = E(x1,x2)∼ν[x2] (6)
then, if F is the class of functions defined in (1), it holds that:
max
f ∈F
EX∼µ[f(X)] − EY∼ν[f[Y]] = 0 (7)
Why is this a possible explanation to the observed behaviour of our GANs example?
2 Optimizers of Neural Network
The goal of this exercise is to implement different optimizers for a handwritten digit classifier using the well-known MNIST dataset.
This dataset has 60000 training images, and 10000 test images. Each image is of size 28 × 28 pixels, and shows a digit from 0 to 9.
(a) General guideline: complete the codes marked with TODO in the optimizers.py.
(b) The implementation of mini-batch SGD and SGD with HB momentum are provided as examples.
Vanilla Minibatch SGD
Input: learning rate γ
1. initialize θ0
2. For t = 0, 1,… ,N-1:
obtain the minibatch gradient gˆt
update θt+1 ← θt − γgˆt
2
LIONS @ EPFL Prof. Volkan Cevher
Minibatch SGD with Momentum
Input: learning rate γ, momentum ρ
1. initialize θ0, m0 ← 0
2. For t = 0, 1,… ,N-1:
obtain the minibatch gradient gˆt
update mt+1 ← ρmt + gˆt
update θt+1 ← θt − γmt+1
(c) Implement of following optimizers:
(a) (10 points) Implement AdaGrad method
AdaGrad
Input: global learning rate γ, damping coefficient δ
1. initialize θ0, r ← 0
2. For t = 0, 1,… ,N-1:
obtain the minibatch gradient gˆt
update r ← r + gˆt gˆt
update θt+1 ← θt −
γ
δ+

r
gˆt
where is the element-wise multiplication between two matrices.
(b) (10 points) Implement RMSProp
RMSProp
Input: global learning rate γ, damping coefficient δ, decaying parameter τ
1. initialize θ0, r ← 0
2. For t = 0, 1,… ,N-1:
obtain the minibatch gradient gˆt
update r ← τr + (1 − τ)gˆt gˆt
update θt+1 ← θt −
γ
δ+

r
gˆt
(c) (10 points) Implement Adam method
Adam
Input: global learning rate γ, damping coefficient δ, first
order decaying paramter β1, second order decaying parameter β2
1. initialize θ0, m1 ← 0, m2 ← 0
2. For t = 0, 1,… ,N-1:
obtain the minibatch gradient gˆt
update m1 ← β1m1 + (1 − β1)gˆt
update m2 ← β2m2 + (1 − β2)gˆt gˆt
correct bias mˆ1 ← m1
1−β
t+1
1
mˆ2 ← m2
1−β
t+1
2
update θt+1 ← θt − γ
mˆ1
δ+

mˆ2
(d) (20 points) Set the learning rate to 0.5, 10−2
, and 10−5
. Run the neural network for 15 epochs, repeat it for 3 times and compare the
obtained average accuracies. Plot the obtained training loss for different optimizers. State how the performance of the adaptive
learning-rate methods versus SGD and SGD with momentum methods is compared.
3
LIONS @ EPFL Prof. Volkan Cevher
3 Guidelines for the preparation and the submission of the homework
Work on your own. Do not copy or distribute your code to other students in the class. Do not reuse any other code related to
this homework. Here are few warnings and suggestions for you to prepare and submit your homework.
• This homework is due at 4:00PM, 15 November, 2019
• Submit your work before the due date. Late submissions are not allowed and you will get 0 point from this homework if
you submit it after the deadline.
• Questions of 0 points are for self study. You do not need to answer them in the report.
• Your final report should include detailed answers and it needs to be submitted in PDF format.
• The PDF file can be a scan or a photo. Make sure that it is eligible.
• The results of your simulations should be presented in the final report with clear explanation and comparison evaluation.
• We provide Pytorch scripts that you can use to implement the algorithms, but you can implement them from scratch using
any other convenient programming tool (in this case, you should also write the codes to time your algorithm and to evaluate
their efficiency by plotting necessary graphs).
• Even if you use the Pytorch scripts that we provide, you are responsible for the entire code you submit. Apart from completing the missing parts in the scripts, you might need to change some written parts and parameters as well, depending
on your implementation.
• The code should be well-documented and should work properly. Make sure that your code runs without errors. If the code
you submit does not run, you will not be able to get any credits from the related exercises.
• Compress your code and your final report into a single ZIP file, name it as ee556_2019_hw2_NameSurname.zip, and
submit it through the moodle page of the course.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Ee-556 homework exercise-2 (for lecture 7)[SOLVED] Ee-556 homework exercise-2 (for lecture 7)
$25