Name: [Solved] CSC421/2516 Homework 4
Brand: Assignment Chef
SKU: [Solved] CSC421/2516 Homework 4
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Submission: You must submit your solutions as a PDF through MarkUs. You can produce the file however you like (e.g. LaTeX, Microsoft Word, scanner) as long as it is readable.

Late Submission: MarkUs will remain open until 3 days after the deadline, after which no late submissions will be accepted. The late penalty is 10% per day, rounded up.

Weekly homeworks are individual work. See the Course Information handout^[1] for detailed policies.

LSTM Gradient [4pts] Here, youll derive the Backprop Through Time equations for the univariate version of the Long-Term Short-Term Memory (LSTM) architecture.

For reference, here are the computations it performs:

i(t) = (w_ixx(t) + w_ihh(t1)) f(t) = (w_fxx(t) + w_fhh(t1)) o(t) = (w_oxx(t) + w_ohh(t1)) g(t) = tanh(wgxx(t) + wghh(t1)) c(t) = f(t)c(t1) + i(t)g(t) h(t) = o(t) tanh(c(t))

[3pts] Derive the Backprop Through Time equations for the activations and the gates:

You dont need to vectorize anything or factor out any repeated subexpressions.

[1pt] Derive the BPTT equation for the weight w_ix:

w_ix=

(The other weight matrices are basically the same, so we wont make you write those out.)

[optional, no points] Based on your answers above, explain why the gradient doesnt explode if the values of the forget gates are very close to 1 and the values of the input and output gates are very close to 0. (Your answer should involve both h^(t)and c^(t).)

Multidimensional RNN [3pts] One of the predecessors to the PixelRNN architecture was the multidimensional RNN (MDRNN). This is like the RNNs we discussed in lecture, except that instead of a 1-D sequence, we have a 2-D grid structure. Analogously to how ordinary RNNs have an input vector and a hidden vector for every time step, MDRNNs have an input vector and hidden vector for every grid square. Each hidden unit receives bottom-up connections from the corresponding input square, as well as recurrent connections from its north and west neighbors as follows:

The activations are computed as follows:

h .

For simplicity, we assume there are no bias parameters. Suppose the grid is GG, the input dimension is D, and the hidden dimension is H.

[1pt] How many weights does this architecture have? How many arithmetic operations are required to compute the hidden activations? (You only need to give big-O, not an exact count.)
[1pt] Suppose that in each step, you can compute as many matrix-vector multiplications as you like. How many steps are required to compute the hidden activations? Explain your answer.
[1pt] Give one advantage and one disadvantage of an MDRNN compared to a conv net.

Reversibility [3pts] In lecture, we discussed reversible generator architectures, which enable efficient maximum likelihood training. In this question, we consider another (perhaps surprising) example of a reversible operation: gradient descent with momentum. Suppose the parameter vector (and hence also the velocity vector p) are both D-dimensional. Recall that the updates are as follows:

p(k+1) p(k) J((k)) (k+1) (k) + p(k+1)

If we denote s^(k)= (^(k),p^(k)), then we can think of the above equations as defining a function s(k+1) = f(s(k)).

[1pt] Show how to compute the inverse, s^(k)= f¹(s^(k+1)).
[2pts] Find the determinant of the Jacobian, i.e.

dets(k+1)/s(k).

Hint: first write the Jacobian as a product of two matrices, one for each step of the above algorithm.

[1] http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSC421/2516 Homework 4

Reviews

Related products

[Solved] CSC421/2516 Programming Assignment 1: Learning Distributed Word Representations

[Solved] CSC421/2516 Homework 1

[Solved] CSC421/2516 Homework 5

[Solved] CSC421/2516 Programming Assignment 2: Convolutional Neural Networks

[Solved] CSC421/2516 Programming Assignment 4: CycleGAN

[Solved] CSC421/2516 Homework 3