Neural Networks
For this assignment, you are asked to implement neural networks. You will use this neural network to classify MNIST database of handwritten digits (0-9). The architecture of the neural network you will implement is based on the multi-layer perceptron (MLP, just another term for fully connected feedforward networks we discussed in the lecture), which is shown as following. It is designed for a K-class classification problem.
Let (xRD,y{1,2,,K})(xRD,y{1,2,,K}) be a labeled instance, such an MLP performs the following computations:
For a KK-class classification problem, one popular loss function for training (i.e., to learn W(1)W(1), W(2)W(2), b(1)b(1), b(2)b(2)) is the cross-entropy loss. Specifically we denote the cross-entropy loss with respect to the training example (x,y)(x,y) by ll:
Note that one should look at ll as a function of the parameters of the network, that is, W(1),b(1),W(2)W(1),b(1),W(2) and b(2)b(2). For ease of notation, let us define the one-hot (i.e., 1-of-KK) encoding of a class yy as
so that
We can then perform error-backpropagation, a way to compute partial derivatives (or gradients) w.r.t the parameters of a neural network, and use gradient-based optimization to learn the parameters.
Submission: You need to submit both neural_networks.py and utils.py.
Q1. Mini batch Stochastic Gradient Descent
First, you need to implement mini-batch stochastic gradient descent which is a gradient-based optimization to learn the parameters of the neural network.You need to realize two alternatives for SGD, one without momentum and one with momentum. We will pass a variable to indicate which option. When 00, the parameters are updated just by gradient. When >0>0, the parameters are updated with momentum and will also represents the discount factor as following:
You can use the formula above to update the weights.Here, is the discount factor such that (0,1)(0,1). It is given by us and you do not need to adjust it. is the learning rate. It is also given by us. is the velocity update (A.K.A momentum update). tt is the gradient
TODO 1
You need to completedef miniBatchStochasticGradientDescent(model, momentum, _lambda, _alpha, _learning_rate)
inneural_networks.py
Notice that for a complete mini-batch SGD, you will also need to find the best size of mini-batch and number of epochs. In this assignment, we omit this step. Both size of mini-batch and number of epoch has already been given. You do not need to adjust them.
Q2. Linear Layer
Second, you need to implement the linear layer of MLP. In this part, you need to implement 3 python functions in class linear_layer
.
In the function def __init__(self, input_D, output_D)
, you need to initialize W with random values using np.random.normal such that the mean is 0 and standard deviation is 0.1. You also need to initialize gradients to zeroes in the same function.
You can use the above formula as a reference to implement the def forward(self, X)
forward pass and def backward(self, X, grad)
backward pass in class linear_layer. In backward pass, you only need to return the backward_output. You also need to compute gradients of W and b in backward pass.
TODO 2
You need to completedef __init__(self, input_D, output_D)
inclass linear_layer
ofneural_networks.py
TODO 3
You need to completedef forward(self, X)
inclass linear_layer
ofneural_networks.py
TODO 4
You need to completedef backward(self, X, grad)
inclass linear_layer
ofneural_networks.py
Q3. Activation function tanh
Now, you need to implement the activation function tanh. In this part, you need to implement 2 python functions in class tanh
. In def forward(self, X)
, you need to implement the forward pass and you need to compute the derivative and accordingly implement def backward(self, X, grad)
, i.e. the backward pass.
You can use the above formula for tanh as a reference.
TODO 5
You need to completedef forward(self, X)
inclass tanh
ofneural_networks.py
TODO 6
You need to completedef backward(self, X, grad)
inclass tanh
ofneural_networks.py
Q4. Activation function relu
You need to implement another activation function called relu. In this part, you need to implement 2 python functions in class relu
. In def forward(self, X)
, you need to implement the forward pass and you need to compute the derivative and accordingly implement def backward(self, X, grad)
, i.e. the backward pass.
You can use the above formula for relu as a reference.
TODO 7
You need to completedef forward(self, X)
inclass relu
ofneural_networks.py
TODO 8
You need to completedef backward(self, X, grad)
inclass relu
ofneural_networks.py
Q5. Dropout (15 points)
To prevent overfitting, we usually add regularization. Dropout is another way of handling overfitting. In this part, you will initially read and understand def forward(self, X, is_train)
i.e. the forward pass of class dropout
. You will also derive partial derivatives accordingly to implement def backward(self, X, grad)
i.e. the backward pass of class dropout
.Now we take an intermediate variable qRJqRJ which is the output from one of the layers. Then we define the forward and the backward passes in dropout as follows.The forward pass obtains the output after dropout.
The backward pass computes the partial derivative of loss with respect to qq from the one with respect to the forward pass result, which is lsls.
Note that pj,j{1,,J}pj,j{1,,J} and rr are not be learned so we do not need to compute the derivatives w.r.t. to them. You do not need to find the best rr since we have picked it for you. Moreover, pj,j{1,,J}pj,j{1,,J} are re-sampled every forward pass, and are kept for the corresponding backward pass.
TODO 9
You need to completedef backward(self, X, grad)
inclass dropout
ofneural_networks.py
Q6. Connecting the dots
In this part, you will combine the modules written from question Q1 to Q5 by implementing TODO snippets in the def main(main_params, optimization_type="minibatch_sgd")
i.e. main function. After implementing forward and backward passes of MLP layers in Q1 to Q5,now in the main function you will call the forward methods and backward methods of every layer in the model in an appropriate order based on the architecture.
TODO 10
You need to completemain(main_params, optimization_type="minibatch_sgd")
inneural_networks.py
Google Colab
Google colab is a free online jupyter notebook made available by google for researchers/students wherein the jupyter notebook is backed by GPU.
Jupyter notebook is a python notebook with executable cells accompanied by textual texts for better documentation of the code. They are a good way to document the code.
GPUs are now standard way to compute weights and gradients during training of neural networks, they are faster than CPUs due to their inherent parallelization nature.
We highly suggest trying it for computation of your forward networks and tinkering around with the num_epochs, learning rate to see how the training loss varies. You can look at it here:
https://colab.research.google.com/
Note
Do NOT change the hyperparameters and submit on vocareum even if your changes give better training loss. Changing the hyperparemeters is just for your understanding of how gradient descent is working to optimize the code.
Rate this product
input features:linear(1):tanh:relu:linear(2):softmax:predicted label:xRDu=W(1)x+b(1),W(1)RMD and b(1)RMh=21+e2u1h=max{0,u}=max{0,u1}max{0,uM}a=W(2)h+b(2),W(2)RKM and b(2)RKz=ea1keakeaKkeaky^=argmaxkzk.input features:xRDlinear(1):u=W(1)x+b(1),W(1)RMD and b(1)RMtanh:h=21+e2u1relu:h=max{0,u}=[max{0,u1}max{0,uM}]linear(2):a=W(2)h+b(2),W(2)RKM and b(2)RKsoftmax:z=[ea1keakeaKkeak]predicted label:y^=argmaxkzk.
l=log(zy)=log(1+kyeakay)l=log(zy)=log(1+kyeakay)
yRK and yk={1, if y=k,0, otherwise.yRK and yk={1, if y=k,0, otherwise.
l=kyklogzk=yTlogz1logzK=yTlogz.l=kyklogzk=yT[logz1logzK]=yTlogz.
=twt=wt1+=twt=wt1+
forward pass:backward pass:u=linear(1).forward(x)=W(1)x+b(1),where W(1) and b(1) are its parameters.[lx,lW(1),lb(1)]=linear(1).backward(x,lu).forward pass:u=linear(1).forward(x)=W(1)x+b(1),where W(1) and b(1) are its parameters.backward pass:[lx,lW(1),lb(1)]=linear(1).backward(x,lu).
tanh:h=21+e2u1tanh:h=21+e2u1
relu:h=max{0,u}=max{0,u1}max{0,uM}relu:h=max{0,u}=[max{0,u1}max{0,uM}]
forward pass:s=dropout.forward(qRJ)=11r1[p1>=r]q11[pJ>=r]qJ,where pj is generated randomly from [0,1),j{1,,J},and r[0,1) is a pre-defined scalar named dropout rate which is given to you.forward pass:s=dropout.forward(qRJ)=11r[1[p1>=r]q11[pJ>=r]qJ],where pj is generated randomly from [0,1),j{1,,J},and r[0,1) is a pre-defined scalar named dropout rate which is given to you.
Reviews
There are no reviews yet.