Submission: You must submit your solutions as a PDF through MarkUs. You can produce the file however you like (e.g. LaTeX, Microsoft Word, scanner) as long as it is readable.
Late Submission: MarkUs will remain open until 3 days after the deadline, after which no late submissions will be accepted. The late penalty is 10% per day, rounded up.
Weekly homeworks are individual work. See the Course Information handout[1] for detailed policies.
Due to the shortened time period, this assignment has only one question, worth 6 points. You get the remaining 4 points for free.
- Variational Free Energy [6pts] Here, your job is to derive some of the formulas relating to the variational free energy (VFE) which we maximize when we train a VAE. Recall that the VFE is defined as:
F(q) = Eq[logp(x|z)] DKL(q(z)kp(z)),
and KL divergence is defined as
DKL(q(z)kp(z)) = Eq[logq(z) logp(z)].
We assume the prior p(z) is a standard Gaussian:
D D
p(z) = N(z;0,I) = Ypi(zi) = YN(zi;0,1).
i=1 i=1
And the variational approximation q(z) is a fully factorized (i.e. diagonal) Gaussian:
D D
q(z) = N(z;,) = Yqi(zi) = YN(zi;i,i).
i=1 i=1
For reference, here are the formulas for the univariate and multivariate Gaussian distributions:
- [1pt] Show that
F(q) = logp(x) DKL(q(z)kp(z|x)).
(Hint: expand out definitions and apply Bayes Rule.)
- [1pt] Show that the KL term decomposes as a sum of KL terms for individual dimensions.
In particular,
DKL(q(z)kp(z)) = XDKL(qi(zi)kpi(zi)).
i
1
CSC421/2516 Winter 2019 Homework 5
- [2pts] Give an explicit formula for the KL divergence DKL(qi(zi)kpi(zi)). This should be a mathematical expression involving i and i. If you like, you may suppress the i subscripts in your solution.
- [2pts] One way to do gradient descent on the KL term is to apply the formula from part (c). Another approach is to compute stochastic gradients using the reparameterization trick:
DKL,
where
and
Show how to compute a stochastic estimate of DKL(qi(zi)kpi(zi)) by doing backprop on the above equations. You may find it helpful to draw the computation graph. If you like, you may suppress the i subscripts in your solution.
2
[1] http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf
Reviews
There are no reviews yet.