This assignment is about using convolutional neural networks for image classification. You will implement, design and train deep convolutional networks for scene recognition using PyTorch, an open source deep learning platform. Moreover, you will take a closer look at the learned network by (1) identifying important image regions for the classification and (2) generating adversarial samples that confuse your model. This assignment is team-based. A team can have up to 3 students.
- Install Anaconda. We recommend using Conda to manage your packages.
- The following packages are needed: PyTorch (1.0.1 with GPU support), OpenCV3, NumPy, Pillow and TensorboardX. And you are in charge of installing them.
- For the visualization of the results, you will need Tensorboard and TensorFlow (a dependency of Tensorboard). You don’t need TensorFlow-gpu in this case.
- You can debug your code and run experiments on CPUs. However, training a neural network is very expensive on CPUs. We recommend using GPU computing for this project. Please setup your team’s cloud instance. Do remember to shutdown the instance when it is not used!
- You will need to download the MiniPlaces dataset for Part II & III of the project. We have included the downloading script. Run download dataset.sh in the assignment folder. All data will be downloaded under ./data/.
- You will need to fill in the missing code in:
- You will need to submit your code, results and a writeup. You can generate the submission once you’ve finished the assignment using:
python ./zip submission.py
This assignment has three parts. An autograder will be used to grade some parts of the assignment. Please follow the instructions closely.
3.1 Understanding Convolutions
In this part, you will need to implement 2D convolution operation–the fundamental component of deep convolutional neural networks. Specifically, a 2D convolution is defined as
Y = W∗S X + B (1)
- Input: X is a 2D feature map of size Ci ×Hi ×Wi (following PyTorch’s convention). Hi and Wi are the height and width of the 2D map and Ci is the input feature channels.
- Weight: W defines the convolution filters and is of size Co×Ci×K×K, where K is the kernel size. For this part, we only consider squared filters.
- Stride: ∗S is the convolution operation with stride S. S is the step size of the sliding window when W convolves with X. For this part, we only consider equal stride size along the height and width. W is the parameter that will be learned from data.
- Bias: B is the bias term of size Co. b is added to every spatial location H × W after the convolution. Again, B is the parameter that will be learned from data.
- Padding: Padding is often used before the convolution. Again, we only consider equal padding along all sides of the feature map. A (zero) padding of size P adds zeros-valued features to each side of the 2D map.
- Output: Y is the output feature map of size Co ×Ho ×Wo, where Ho = + 1 and
Helper Code: We have provided you helper functions for the implementation (./code/student code.py). You will need to fill in the missing code in the class CustomConv2DFunction. You can use the fold / unfold functions and any matrix / tensor operations provided by PyTorch, except the convolution functions. You do not need to modify the code in the class CustomConv2d. This is the module wrapper for your code.
Requirements: You will need to implement both the forward and backward propagation for this 2D convolution operation. The implementation should work with any kernel size K, input and output feature channels Ci/Co, stride S and padding P. Importantly, your implementation need to compute Y given input X and parameters W and B, and the gradients of and . All derivations of the gradients can be found in our course material, except (provided). In your write up, please describe your implementation.
Testing Code: How can you make sure that your implementation is correct? You can compare your forward / backward propagation results with PyTorch’s own Conv2d implementation. You can also compare your gradients with the numerical gradients. We included a sample testing code in ./code/test conv.py. Please make sure your code can pass the test.
3.2 Design and Train a Convolutional Neural Network
In the second part, you will design and train a convolutional neural network for scene classification on MiniPlaces dataset.
MiniPlaces Dataset: MiniPlaces is a scene recognition dataset developed by MIT. This dataset has 120K images from 100 scene categories. The categories are mutually exclusive. The dataset is split into 100K images for training, 10K images for validation and 10K for testing. You can download the dataset by running download dataset.sh in the assignment folder. The images and annotations will be located under ./data. We will evaluate top-1/5 accuracy for the performance metric. For more details about the dataset, please refer to their github page https://github.com/CSAILVision/miniplaces.
Helper Code: We have provided you helper code for training and testing a deep model (./code/main.py). You will have to run this script many times but you are unlikely to modify this file. For your reference, a simple neural network is implemented by the class SimpleNet in ./code/student code.py. You will need to modify this class for this part of the project.
Requirements: You will design and train a deep network for scene recognition. You model must be trained from scratch using the training set. No other source of information is allowed, e.g., using labels of the validation set for training, or using model parameters that are learned from ImageNet. This part includes 4 different sections.
- Section 0: Let us start by training our first deep network from scratch! You do not need to write any code in this section–we provide the dataloader and a simple network you can use. You can start by running python ./main.py ../data
You will need to use GPU computing for this training. And it will take a few hours and give you a model with around 47% top-1 accuracy on the validation set. Do remember to put your training inside a container, e.g., tmux, such that your process won’t get killed when you SSH session is disconnected. You can also use watch -n 0.1 nvidia-smi to get a rough estimation of GPU utilization and memory consumption.
Once the traininng in complete, your best model will be saved as ./models/model best.pth.tar. You can evaluate this model by python ./main.py ../data –resume=../models/model best.pth.tar -e
- Section 1: While waiting for the training of the model, you can read the code and understand the training. Please describe the training process implemented in our code in your writeup. You should address the following questions: Which loss function/optimization method is used? How is the learning rate scheduled? Is there any regularization used? Why is top-K accuracy a good metric for this dataset?
- Section 2: Let us try to use our own convolution to replace PyTorch’s version and train the model for 10 epochs. This can be done by python ./main.py ../data –epoches=10 –use-custom-conv
How is your implementation different from PyTorch’s convolution in terms of training memory, speed and convergence rate? Why? Describe your findings in the writeup.
- Section 3: Now let us look at our simple network. The current version is a combination of convolution, ReLU, max pooling and fully connected layers. Your goal is to design a better network for this recognition task. There are a couple of things you can explore here. For example, you can add more convolutional layers , yet the model might start to diverge in the training. This divergence can be avoided by adding residual connections  and/or batch normalization . You might also want to try the multi-branch architecture in Google Inception networks . You can also tweak the hyper-parameters for training, e.g., learning rate, weight decay, training epochs etc. These hyper-parameters can be passed to main.py in the terminal. You should implement your network in student code.py and call main.py for training. Please justify your design of the model and/or the training, and present your results in the writeup. These results include all training curves and training/validation accuracy.
Monitoring the Training: All intermediate results during training, including training loss, learning rate, train/validation accuracy are logged into files under
./logs. You can monitor and visualize these variables by using tensorboard –logdir=../logs
We recommend copying the logs folder to a local machine and use Tensorboard locally for the curves. Thus, you can avoid to setup a Tensorboard server on the cloud. Please include the curves of your training loss and train/val accuracy in your writeup. Do these curves look normal to you? Please provide your discussion in the writeup.
[Bonus] MiniPlaces Challenge: You can choose to upload your final model and thus participate our MiniPlaces challenge. This challenge will be judged by evaluating your model on a hold-out test set. If you decided to do so, please copy your model best.pth.tar to results folder. To make this challenge a bit more challenging, we do have some constraints for your model. First, your model has to be trained under 4 hours using a K40 GPU on the cloud. We do not have a way to strictly enforce this rule, yet please keep this number in mind. Second, your model (tar file) size has to be smaller than 10MB. As a point of reference, our SimpleNet is only 5.5MB with a top-1 accuracy of 47%. Teams that are ranked top 3 in this challenge will received 2 bonus points (out of the 15pt for this homework assignment). We encourage you to take this challenge.
3.3 Attention and Adversarial Samples
In the final part, we will look at attention maps and adversarial samples. They present two critical aspects of deep neural networks: interpretation and robustness, and thus will help you gain insight about these networks.
Helper Code: Helper code is provided in ./code/main.py and student code.py for visualizing attention maps and generating adversarial samples. For attention maps, you will need to fill in the missing code in class GradAttention. And for adversarial samples, you need to complete the class PGDAttack.
Requirements: You will implement methods for generating attention maps and adversarial samples
- Attention: Suppose you have a trained model. If you minimize the loss of the predicted label and compute the gradient of the loss w.r.t. the input, the magnitude of a pixel’s gradient indicates how important that pixel is for the decision. You can create a 2D attention map by (1) computing the input gradient by minimizing the loss of the predicted label (most confident prediction); (2) taking the absolute values of the gradients; and (3) pick the maximum values across three channels. This method was discussed in . Once you finished the coding, you can run python ./main.py ../data –resume=../models/model best.pth.tar -e -v This command will evaluate your model using your trained model (assuming model best.pth.tar) and visualize the attention maps. All attention maps will saved under ./logs. Again you can use Tensorboard tensorboard –logdir=../logs
Now you will see a tab named “Image”. And you can scroll the slide bar on top of the image to see samples from different batches. You can also zoom in the image by clicking on it. Please include and discuss the visualization in your writeup.
- Adversarial Samples: Interestingly, if you you minimize the loss of a wrong label and compute the gradient of the loss w.r.t. the input, you can create adversarial samples that will confuse the model! This was first presented in . Let us use the least confident label as a proxy for the wrong label. And you will implement the Projected Gradient Descent in . Specifically, PGD takes several steps of fast gradient sign method, and each time clip the result to the -neighborhood of the input. You will need to be a bit careful for this implementation. You do not want PyTorch to record your gradient operations in the computation graph. Otherwise, it will create a graph that grows indefinitely over time. Again, you can call main.py once you complete the implementation python ./main.py ../data –resume=../models/model best.pth.tar -a -v This command will generate adversarial samples on the validation set and try to attack your model. And you can see how the accuracy drops (significantly!). Moreover, adversarial samples will be saved in the logs folder. And you can use Tensorboard to check them. This time, you will find tabs “Org Image” and “Adv Image”. Can you see the difference between the original images and the adversarial samples? Please discuss your implementation of PGD and present the results (accuracy drop and adversarial samples) in your writeup.
[Bonus] Adversarial Training: A deep model should be robust under adversarial samples. A possible solution to build this robustness is using adversarial training, as described in [1, 4]. The key idea is to generate adversarial samples and feed these samples into the network during training. To implement adversarial training, you can attach your PGD to the forward function in the SimpleNet (See the comments in the code for details). Unfortunately, this training can be 10x times more expansive than a normal training. To accelerate this process, you can (1) reduce the number of steps in PGD and (2) reduce the number of epochs in training. Your goal is to show that in comparison to a model using normal training, your model using adversarial training has a better chance to survive adversarial attacks. Please discuss your experimental design, implementation and results in the writeup. Your team will received a maximum of 2 bonus points (out of the 15pt for this homework assignment).
For this assignment, and all other assignments, you must submit a project report in PDF. Every team member should send the same copy of the report. Please clearly identify the contribution of all the team members. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. In the case of this project, we have included detailed instructions for the writeup in each part of the project. You can also discuss anything extra you did. Feel free to add any other information you feel is relevant. A good writeup doesn’t just show results, it tries to draw some conclusions from your experiments.
5 Handing in
This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5%. The folder you hand in must contain the following:
- code/ – directory containing all your code for this assignment
- writeup/ – directory containing your report for this assignment.
- results/ – directory containing your results. Please include your model if you decide to participate in our challenge.
Do not use absolute paths in your code (e.g. /user/classes/proj1). Your code will break if you use absolute paths and you will lose points because of it. Simply use relative paths as the starter code already does. Do not turn in the data / logs / models folder. Hand in your project as a zip file through Canvas. You can create this zip file using python zip submission.py.
- J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
- He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
- Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
- Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR, 2014.
- Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. In ICLR, 2015.
- Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.