5/5 - (1 vote)

Backpropagation

The goal of homework 1 is to help you understand the common techniques used in Deep Learning and how to update network parameters by the using backprop- agation algorithm.

Part 1 has two sub-parts, 1.1, 1.2, 1.3 majorly deal with the theory of backprop- agation algorithm whereas 1.4 is to test conceptual knowledge on deep learning. For part 1.2 and 1.3, you need to answer the questions with mathematical equa- tions. You should put all your answers in a PDF file and we will not accept any scanned hand-written answers. It is recommended to use LATEX.

For part 2, you need to program in Python. It requires you to implement your own forward and backward pass without using autograd. You need to submit your mlp.py file for this part.

The due date of homework 1 is 23:55 EST of 09/27. Submit the following files in a zip file your_net_id.zip through NYU Brightspace:

theory.pdf mlp.py

The following behaviors will result in penalty of your final score:

5% penalty for submitting your files without using the correct format. (in- cluding naming the zip file, PDF file or python file wrong, or adding extra files in the zip folder, like the testing scripts from part 2).
20% penalty for late submission within the first 24 hours. We will not accept any late submission after the first 24 hours.
20% penalty for code submission that cannot be executed using the steps we mentioned in part 2. So please test your code before submit it.1

1 Theory (50pt)

To answer questions in this part, you need some basic knowledge of linear algebra and matrix calculus. Also, you need to follow the instructions:

Every vector is treated as column vector.
You need to use the numerator-layout notation for matrix calculus. Pleaserefer to Wikipedia about the notation.
You are only allowed to use vector and matrix. You cannot use tensor inany of your answer.
Missing transpose are considered as wrong answer.

1.1 Two-Layer Neural Nets

You are given the following neural net architecture: Linear1 f Linear2 g

where Lineari(x) = W(i)x+b(i) is the i-th affine transformation, and f,g are element-wise nonlinear activation functions. When an input x Rn is fed to the network, y RK is obtained as the output.

1.2 Regression Task

We would like to perform regression task. We choose f () = ()+ = ReLU() and g to be the identity function. To train this network, we choose MSE loss function lMSE(y, y) = y y2, where y is the target output.

(a) (1pt) Name and mathematically describe the 5 programming steps you would take to train this model with PyTorch using SGD on a single batch of data.
(b) (5pt)Forasingledatapoint(x,y),writedownallinputsandoutputsforfor- ward pass of each layer. You can only use variable x,y,W(1),b(1),W(2),b(2) in your answer. (note that Lineari(x)=W(i)x+b(i)).

(c) (8pt) Write down the gradient calculated from the backward pass. You can only use the following variables: x, y,W(1),b(1),W(2),b(2), l , z2 , y in your

y z1 z3 answer, where z1,z2,z3,y are the outputs of Linear1,f,Linear2,g.

(d) (3pt) Show us the elements of z2 , y and l (be careful about the dimen-

sionality)?

z 1

z 3 y

1.3 Classification Task

We would like to perform multi-class classification task, so we set both f , g = , the logistic sigmoid function (z)=. (1+exp(z))1.

(a) (5pt + 8pt + 3pt) If you want to train this network, what do you need to change in the equations of (b), (c) and (d), assuming we are using the same MSE loss function.
(b) (1pt) Things are getting better. You realize that not all intermediate hidden activations need to be binary (or soft version of binary). You decide to use f () = ()+ but keep g as . Explain why this choice of f can be beneficial for training a (deeper) network.

1.4 Conceptual Questions

(a) (2pt) Why is softmax actually soft(arg)max?
(b) (2pt) In what situations, soft(arg)max can become unstable?
(c) (2pt) Should we have two consecutive linear layers in a neural network? Why or why not?
(d) (4pt)WecoveredvariousactivationfunctionsinclassincludingReLU,Tanh, Sigmoid, and LeakyReLU. Can you give one advantage and disadvantage of each function?
(e) (4pt) What are 4 different types of linear transformations? What is the role of linear transformation and non linear transformation in a neural network?
(f) (2pt) How should we adjust the learning rate as we increase or decrease the batch size?

2 Implementation (50pt)

You need to implement the forward pass and backward pass for Linear, ReLU, Sigmoid, MSE loss, and BCE loss in the attached mlp.py file. We provide three example test cases test1.py, test2.py, test3.py. We will test your implemen- tation with other hidden test cases, so please create your own test cases to make sure your implementation is correct.

Recommendation: Go through this Pytorch tutorial to have a thorough under- standing of Tensors.

Extra instructions:

Please use Python version 3.7 and PyTorch version 1.7.1. We recommend you to use Miniconda the manage your virtual environment.
We will put your mlp.py file under the same directory of the hidden test scriptsandusethecommandpython hiddenTestScriptName.pytocheck your implementation. So please make sure the file name is mlp.py and it can be executed with the example test scripts we provided.
You are not allowed to use PyTorch autograd functionality in your imple- mentation.
Be careful about the dimensionality of the vector and matrix in PyTorch. It is not necessarily follow the the Math you got from part 1.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] DS-GA1008 Homework 1 Backpropagation

Reviews