- You are required to do this assignment in the Python (version 3) programming language. Do not use any auto-differentiation toolboxes (PyTorch, TensorFlow, Keras, etc) you are only permitted and recommended to vectorize your computation using the Numpy library.
- We recommend that you look through all of the problems before attempting the first problem. However we do recommend you complete the problems in order, as the difficulty increases, and questions often rely on the completion of previous questions.
- If you havent done so, use pdb to debug your code effectively.
Introduction
In this assignment, you will continue to develop your own version of PyTorch, which is of course called MyTorch (still a brilliant name; a master stroke. Well done!).
Homework Structure
Below is a list of files that are directly relevant to hw3.
IM | POR | TANT: | First, | copy | the | high | lighted | files/ | fold | ers | from | the | HW3P1 | hand | out | over | to | the | cor | re | spond |
ing | fold | ers | that | you | used | in | hw1 | and | hw2. |
NOTE: We recommend you make a backup of your hw3 files before copying everything over, just in case you break code or want to revert back to an earlier version.
Next, | copy | and | paste | the | fol | low | ing | code | stubs | from | hw3/stubs.py | into | the | cor | rect | files. |
- Copy Slice(Function) into nn/functional.py.
- Copy Cat(Function) into nn/functional.py.
- Copy unsqueeze(), getitem () and len () into tensor.py. These are methods for class Tensor.
- Copy function cat into tensor.py. This is an independent function and not a class method.
NOTE: You may need to define pow , which implicitly calls the Pow(Function), if you havent done so
al | ready. | This | should | be | able | to | han | dle | int | ex | po | nents. | Check | HW3P1 | FAQ | for | help. |
0.1 Running/Submitting Code
This section covers how to test code locally and how to create the final submission.
0.1.1 Running Local Autograder
Run the command below to calculate scores and test your code locally.
./grade.sh 3
If this doesnt work, converting line-endings may help:
sudo apt install dos2unix dos2unix grade.sh
./grade.sh 3
If all else fails, you can run the autograder manually with this:
python3 ./autograder/hw3_autograder/runner.py
0.1.2 Running the Sandbox
Weve provided sandbox.py: a script to test and easily debug basic operations and autograd.
Note: We will not provide new sandbox methods for this homework. You are required to write your own from now onwards.
python3 sandbox.py
0.1.3 Submitting to Autolab
Note: You can submit to Autolab even if youre not finished yet. You should do this early and often, as it guarantees you a minimum grade and helps avoid last-minute problems with Autolab.
Run this script to gather the needed files into a handin.tar file:
./create_tarball.sh
If this crashes (with some message about a hw4 folder) use dos2unix on this file too.
You can now upload handin.tar to Autolab .
1 RNN
In mytorch/nn/rnn.py we will implement a full-fledged Recurrent Neural Network module with the ability to handle variable length inputs in the same batch.
1.1 RNN Unit
Follow the starter code available in mytorch/nn/rnn.py to get a better sense of the various attributes of the RNNUnit.
Figure 1: The computation flow for the RNN Unit forward.
) (1)
The equation you should follow is given in equation 1.
You | are | re | quired | to | im | ple | ment | the | for | ward | method | of | RN | NUnit | mod |
ule. The description of the inputs and
expected outputs are specified below:
Inputs
- x (effective batch size, input size)
- Input at the current time step.
- NOTE: For now interpret effective batch size same as regular batch size. The difference will become apparent later in the homework. For the definition of effective batch size check Appendix and/or Figure 5
- h (effective batch size, hidden size)
- Hidden state generated by the previous time step, ht1
Outputs
- h prime: (effective batch size, hidden size)
New hidden state generated at the current time step, h0t
NOTE: As a matter of convention, if you apply certain reshape/transpose operations while using self.weight ih then please do the same for self.weight hh. This is important to note because self.weight hh is symmetric and is therefore exposed to multiple interpretations on how to use it.
1.2 Detour 1: Cat, Slice, Unsqueeze
In the following section we implement some necessary functions which will prove to be extremely handy while completing the rest of the homework. These are namely: Cat, Slice and Unsqueeze.
1.2.1 Cat (7 points)
Concatenates the given sequence of tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty. Please refer to the PyTorch Documentation for better understanding. If you are not into documentations then please refer back to Recitation 0 where this was covered as well.
First implement the corresponding Cat class in mytorch/nn/functional.py. This will be a subclass of
Func | tion | class. |
of | se |
Next implement a helper function cat in the mytorch/tensor.py to call the concatenation operation on a list quences. This should in turn correctly call the corresponding Function sub-class. Below is a description
of the required inputs and outputs.
Inputs
- seq: list of tensors
- The list basically contains the sequences we want to concatenate
- dim: (int, default=0)
- The dimension along which we are supposed to concatenate the tensors in the list seq
Outputs Tensor:
- The concatenated tensor
NOTE: You dont need to add anything to the Tensor class in mytorch/tensor.py with respect to Cat operation.
1.2.2 Slice
Despite being worth only 5 points, implementing this operation takes your Tensor class to a while new level. With this operation up and running, you can index your Tensor class just like you index Numpy arrays/PyTorch tensors, while autograd takes charge of calculating the required gradients (assuming you implemented the backward correctly).
First implement the corresponding Slice class in mytorch/nn/functional.py. This will be a subclass of
Func | tion | class. | |||||||||||||||||||||||
Next | add | the | getitem | method | in | your | Ten | sor | class | my | torch tensor.py, | which | in | turn | calls | the | Slice | ||||||||
func | tion | sub | -class. | ||||||||||||||||||||||
HINT
The implementation of a slicing operation from scratch may appear to be a daunting task but we will employ a cool trick to get this done in just a few lines of code. Whenever we try to slice a segment of a given tensor using the [] notation, python creates a key (depending on what you provide within []) and passes this to the getitem function. This key is used by the class to provide the appropriate result to the user. The key can be either an integer, a python slice object, a list etc depending on how we slice our object. This is the principle implemented in Numpy for slicing ndarrays. For the purpose of our implementation we will try to leverage this fact to make our task easy. Once we create a getitem method for our tensor class, everytime we slice our tensors the getitem method will be invoked with the appropriate key. Given that the intention of slicing on our tensor is to get the appropriate segment of the underlying ndarray (data attribute) as a tensor object, can you use this key to complete the task.
1.2.3 Unsqueeze
Returns a new tensor with a dimension of size one inserted at the specified position. Please refer to the PyTorch Documentation for better understanding. If you are not into documentations then please refer back to Recitation 0 where this was covered as well.
Add | the | unsqueeze | method | in | your | Ten | sor | class | my | torch tensor.py |
.
HINT: Why is this only worth 3 points? Well if you look closely, you might be able to use a function you implemented previously without any extra effort. The name of the function you might need begins with R in numpy.
1.3 Detour 2: pack sequence, unpack sequence
Whats all the fuss about handling variable length samples in a batch?
This section is devoted to the construction of objects of class PackedSequence which help us create batches in the context of RNNs.
As you might have guessed by now, handling of variable length sequences in a batch requires way more machinery than what exists in a simple RNN. In this section we will create all the utility functions that will help us with the handling of variable length sequence in a single batch.
Before we proceed lets understand an important class that packages our packed sequences in a form which makes it easier for us to deal with them. This will also help you better understand why handling variable length sequences in a batch can be a tedious task.
Refer to Appendix for more details.
Please | use | the | Ten | sor | Slic | ing | and | Cat | op | er | ations | you | im | ple | mented | in | the | pre | vi | ous | sec | tions | while | im | ple | ment |
ing | these | func | tions. |
1.3.1 pack sequence (15 points)
In | my | torch/n | n/u | til.py | im | ple | ment | the | method | pack sequence |
.
This function as the name suggests, packs a list of variable length Tensors into an object of class PackedSequence. A list of tensors is given (as shown in Figure 2) which are to be converted into a PackedSequence object (detailed class description below). This object holds the packed 2d tensor which is later fed into the RNN module for training and testing. Refer to Figures 2, 3 and 4 to better understand what pack sequence does.
Figure 2: List of tensors we want to pack
Figure 3: First we sort the list in a descending order based on number of timesteps in each
Figure 4: Final Packed 2d Tensor
Class: PackedSequence Attributes
- data
- The actual packed tensor. In the context of this homework this is a 2D tensor. Refer to 4
- sorted indices
- 1d ndarray (numpy n-dimensional array) of integers containing the indices of elements in the original list when they are sorted based on number of time steps in each sample. Refer to Figure 2 and Figure 3
- batch sizes
- 1d ndarray of integers where ith element represents the number of samples in the original list having atleast (i+1) timesteps. Refer to Figure 4 for more clarity.
Function: pack sequence Inputs
- seq: list of tensors
The list contains tensors representing individual instances, each having a variable length.
Outputs
- PackedSequence:
PackedSequence
1.3.2 unpack sequence
In | my | torch/n | n/u | til.py | im | ple | ment | the | method | un | pack sequence |
.
This function unpacks a given PackedSequence object into a list of tensors that were used to create it. This is also a part of the class PackedSequence.
Function: unpack sequence Inputs
- ps: PackedSequence
Outputs
- list of Tensors:
1.4 TimeIterator
In | my | torch/n | n/rnn.py | im | ple | ment | the | for | ward | method | for | the | TimeIt | er | a | tor |
.
er | ates | through | time | by | pro | cess | ing | the | en | tire | se | quence | of |
For a given input this class it timesteps. Can be thought to represent a single layer for a given basic unit (interpret this as RNNUnit for now) which is applied at each time step.
Working of forward method in TimeIterator
- This modules forward method is tasked with receiving a batch of samples packed in the PackedSequence form.
- The method runs the section of the input corresponding to a single time-step across the batches through an RNNUnit.
- The hidden state returned by the RNNUnit is collected and then the section of the input corresponding to the next time-step across the batches are fed through the RNNUnit along with the previously ( lasttime step) collected hidden state to generate a new hidden-state.
- This process is done iteratively till all time-steps are exhausted for each sample in the batch.
- Follow Figures 5, 6, 7 and 8 to get a better understanding of how TimeIterator processes and given input PackedSequence.
The inputs and outputs of the forward method are given below for the
Class: TimeIterator Forward Method Inputs
- input: PackedSequence
- hidden: Tensor (batch size,hidden size)
Outputs
- PackedSequence: hiddens at each timestep for each sample packaged as a packed sequence
Sam | ples | are | or | dered | in | de | scend | ing | or | der | based | on | num | ber | of |
- Tensor (batch size,hidden size): This is the hidden generated by the last time step for each sample joined together. This is a slight deviation from PyTorch.
base | class | with | ba | sic unit | = | RN |
Finally in mytorch/nn/rnn.py create a class RNN as a sub-class of TimeIterator which instantiates the NUnit. Please look at the code for more details.
Figure 5: Iteration 0 for the TimeIterator
Figure 6: Iteration 1 for the TimeIterator
Figure 7: Iteration 2 for the TimeIterator
Figure 8: The final output from the TimeIterator
2 GRU
In this section we will explore the world of GRU. We will heavily use the machinery built for RNN to make our lives easier and create a full-fledged working GRU.
2.1 GRU Unit
nam | ing | con | ven | tion | than | the | Py | torch | doc | u | men | ta |
In mytorch/nn/gru.py implement the forward pass for a GRUUnit (though we follow a slightly different tion.) The equations for a GRU cell are the following:
rt = (Wirxt + bir + Whrht1 + bhr) | (2) |
zt = (Wizxt + biz + Whzht1 + bhz) | (3) |
nt = tanh(Winxt + bin + rt (Whnht1 + bhn)) | (4) |
ht = (1 zt) nt + zt ht1 | (5) |
Please refer to (and use) the GRUUnit class attributes defined in the init method. You are not expected to define any extra attributes for a working implementation of this class.
The inputs to the GRUCell forward method are x and h represented as xt and ht1 in the equations above. These are the inputs at time t.
The output of the forward method is ht in the equations above.
You are required to implement the forward method of this module. The description of the inputs and expected outputs are specified below:
Inputs
- x (effective batch size, input size)
- Input at the current time step.
- For the definition of effective batch size check Appendix and/or Figure 5
- h (effective batch size, hidden size)
- Hidden state generated by the previous time step, ht1
Outputs
- h prime: (effective batch size, hidden size)
New hidden state generated at the current time step, h0t
2.2 GRU
class | with | ba | sic unit | = | GRUU |
Finally in mytorch/nn/gru.py create a class GRU as a sub-class of TimeIterator which instantiates the base nit. Please look at the code for more details.
Appendix
A Glossary
- effective batch: ith effective batch refers to the set of ith timesteps from each sample that are simultaneously fed to the RNNUnit in the (i 1)th iteration inside the TimeIterator
- effective batch size: number of samples in an effective batch. Effective batch size of the ith effective batch is equal to the number of samples containing atleast (i+1) timesteps
B Whats all the fuss about handling variable length samples in one batch?
B.1 Importance of batching in Deep Learning
When training a neural network, we stack together a number of inputs a.k.a batching and pass them together for a single training iteration. From a computational standpoint this helps us make the most of GPUs by parallelizing this computation (done by various frameworks such a PyTorch for you). Moreover ideas like batch normalization fundamentally depend on existence of batches instead of using single samples for training. Therefore there is no denying the importance of batching in the deep learning world.
B.2 Batching for MLPs and CNNs
Life in a world with only MLPs and CNNs was simpler from a batching perspective. Each input sample had exactly the same shape. These input samples in the form of tensors (each with the same shape) could then be stacked together along a new dimension to create a higher-dimensional tensor which then becomes our batch. For example consider a simple 1-D input where each sample has 20 features. We now want to create a batch of 256 such samples. Each sample has a shape (20,) which are then stacked along a new dimension (this now becomes dimension 0) to provide a batch of shape (256,20) where the first dimension represents number of samples in a batch while the second dimension represents the number of features.
B.3 Batching in RNNs
With RNNs, this becomes very tricky. Each sample in this context has a temporal dimension (atleast in the simplest case). The number of feature at each time step remains fixed for all the samples. Let this be K for the purpose of our discussion. Therefore each sample i has Ti time steps where at each time step we have K features. The ith sample can then be represented by a tensor of shape (Ti,K). Given that the first dimension is of variable length there is no simple way for us to create batches as we did for MLPs and CNNs. One might argue for fixed time-step inputs, but that severely limits the power of RNNs/GRUs/LSTMs and reduces them to a large fixed MLP. Therefore batching in RNNs require a special setup which we provide via PackedSequences.
Reviews
There are no reviews yet.