# [Solved] BMI 826 / CS 838 Homework Assignment 2

30 \$

SKU: [Solved] BMI 826 / CS 838 Homework Assignment 2 Category: Tag:

# 1 Overview

This assignment is about using convolutional neural networks for image classification. You will implement, design and train deep convolutional networks for scene recognition using PyTorch, an open source deep learning platform. Moreover, you will take a closer look at the learned network by (1) identifying important image regions for the classification and (2) generating adversarial samples that confuse your model. This assignment is team-based. A team can have up to 3 students.

# 2 Setup

• Install Anaconda. We recommend using Conda to manage your packages.
• The following packages are needed: PyTorch (1.0.1 with GPU support), OpenCV3, NumPy, Pillow and TensorboardX. And you are in charge of installing them.
• For the visualization of the results, you will need Tensorboard and TensorFlow (a dependency of Tensorboard). You don’t need TensorFlow-gpu in this case.
• You can debug your code and run experiments on CPUs. However, training a neural network is very expensive on CPUs. We recommend using GPU computing for this project. Please setup your team’s cloud instance. Do remember to shutdown the instance when it is not used!
• You will need to fill in the missing code in:

./code/student code.py

• You will need to submit your code, results and a writeup. You can generate the submission once you’ve finished the assignment using:

python ./zip submission.py

# 3 Details

## 3.1 Understanding Convolutions

In this part, you will need to implement 2D convolution operation–the fundamental component of deep convolutional neural networks. Specifically, a 2D convolution is defined as

### Y = W∗S X + B (1)

• Input: X is a 2D feature map of size Ci ×Hi ×Wi (following PyTorch’s convention). Hi and Wi are the height and width of the 2D map and Ci is the input feature channels.
• Weight: W defines the convolution filters and is of size Co×Ci×K×K, where K is the kernel size. For this part, we only consider squared filters.
• Stride: S is the convolution operation with stride S. S is the step size of the sliding window when W convolves with X. For this part, we only consider equal stride size along the height and width. W is the parameter that will be learned from data.
• Bias: B is the bias term of size Co. b is added to every spatial location H × W after the convolution. Again, B is the parameter that will be learned from data.
• Padding: Padding is often used before the convolution. Again, we only consider equal padding along all sides of the feature map. A (zero) padding of size P adds zeros-valued features to each side of the 2D map.
• Output: Y is the output feature map of size Co ×Ho ×Wo, where Ho = + 1 and

Helper Code: We have provided you helper functions for the implementation (./code/student code.py). You will need to fill in the missing code in the class CustomConv2DFunction. You can use the fold / unfold functions and any matrix / tensor operations provided by PyTorch, except the convolution functions. You do not need to modify the code in the class CustomConv2d. This is the module wrapper for your code.

Requirements: You will need to implement both the forward and backward propagation for this 2D convolution operation. The implementation should work with any kernel size K, input and output feature channels Ci/Co, stride S and padding P. Importantly, your implementation need to compute Y given input X and parameters W and B, and the gradients of and . All derivations of the gradients can be found in our course material, except (provided). In your write up, please describe your implementation.

Testing Code: How can you make sure that your implementation is correct? You can compare your forward / backward propagation results with PyTorch’s own Conv2d implementation. You can also compare your gradients with the numerical gradients. We included a sample testing code in ./code/test conv.py. Please make sure your code can pass the test.

## 3.2 Design and Train a Convolutional Neural Network

In the second part, you will design and train a convolutional neural network for scene classification on MiniPlaces dataset.

MiniPlaces Dataset: MiniPlaces is a scene recognition dataset developed by MIT. This dataset has 120K images from 100 scene categories. The categories are mutually exclusive. The dataset is split into 100K images for training, 10K images for validation and 10K for testing. You can download the dataset by running download dataset.sh in the assignment folder. The images and annotations will be located under ./data. We will evaluate top-1/5 accuracy for the performance metric. For more details about the dataset, please refer to their github page https://github.com/CSAILVision/miniplaces.

Helper Code: We have provided you helper code for training and testing a deep model (./code/main.py). You will have to run this script many times but you are unlikely to modify this file. For your reference, a simple neural network is implemented by the class SimpleNet in ./code/student code.py. You will need to modify this class for this part of the project.

Requirements: You will design and train a deep network for scene recognition. You model must be trained from scratch using the training set. No other source of information is allowed, e.g., using labels of the validation set for training, or using model parameters that are learned from ImageNet. This part includes 4 different sections.

• Section 0: Let us start by training our first deep network from scratch! You do not need to write any code in this section–we provide the dataloader and a simple network you can use. You can start by running python ./main.py ../data

You will need to use GPU computing for this training. And it will take a few hours and give you a model with around 47% top-1 accuracy on the validation set. Do remember to put your training inside a container, e.g., tmux, such that your process won’t get killed when you SSH session is disconnected. You can also use watch -n 0.1 nvidia-smi to get a rough estimation of GPU utilization and memory consumption.

Once the traininng in complete, your best model will be saved as ./models/model best.pth.tar. You can evaluate this model by python ./main.py ../data –resume=../models/model best.pth.tar -e

• Section 1: While waiting for the training of the model, you can read the code and understand the training. Please describe the training process implemented in our code in your writeup. You should address the following questions: Which loss function/optimization method is used? How is the learning rate scheduled? Is there any regularization used? Why is top-K accuracy a good metric for this dataset?
• Section 2: Let us try to use our own convolution to replace PyTorch’s version and train the model for 10 epochs. This can be done by python ./main.py ../data –epoches=10 –use-custom-conv

How is your implementation different from PyTorch’s convolution in terms of training memory, speed and convergence rate? Why? Describe your findings in the writeup.

• Section 3: Now let us look at our simple network. The current version is a combination of convolution, ReLU, max pooling and fully connected layers. Your goal is to design a better network for this recognition task. There are a couple of things you can explore here. For example, you can add more convolutional layers [5], yet the model might start to diverge in the training. This divergence can be avoided by adding residual connections [2] and/or batch normalization [3]. You might also want to try the multi-branch architecture in Google Inception networks [7]. You can also tweak the hyper-parameters for training, e.g., learning rate, weight decay, training epochs etc. These hyper-parameters can be passed to main.py in the terminal. You should implement your network in student code.py and call main.py for training. Please justify your design of the model and/or the training, and present your results in the writeup. These results include all training curves and training/validation accuracy.

Monitoring the Training: All intermediate results during training, including training loss, learning rate, train/validation accuracy are logged into files under

./logs. You can monitor and visualize these variables by using tensorboard –logdir=../logs

We recommend copying the logs folder to a local machine and use Tensorboard locally for the curves. Thus, you can avoid to setup a Tensorboard server on the cloud. Please include the curves of your training loss and train/val accuracy in your writeup. Do these curves look normal to you? Please provide your discussion in the writeup.

[Bonus] MiniPlaces Challenge: You can choose to upload your final model and thus participate our MiniPlaces challenge. This challenge will be judged by evaluating your model on a hold-out test set. If you decided to do so, please copy your model best.pth.tar to results folder. To make this challenge a bit more challenging, we do have some constraints for your model. First, your model has to be trained under 4 hours using a K40 GPU on the cloud. We do not have a way to strictly enforce this rule, yet please keep this number in mind. Second, your model (tar file) size has to be smaller than 10MB. As a point of reference, our SimpleNet is only 5.5MB with a top-1 accuracy of 47%. Teams that are ranked top 3 in this challenge will received 2 bonus points (out of the 15pt for this homework assignment). We encourage you to take this challenge.

## 3.3 Attention and Adversarial Samples

In the final part, we will look at attention maps and adversarial samples. They present two critical aspects of deep neural networks: interpretation and robustness, and thus will help you gain insight about these networks.

Helper Code: Helper code is provided in ./code/main.py and student code.py for visualizing attention maps and generating adversarial samples. For attention maps, you will need to fill in the missing code in class GradAttention. And for adversarial samples, you need to complete the class PGDAttack.

Requirements: You will implement methods for generating attention maps and adversarial samples

• Attention: Suppose you have a trained model. If you minimize the loss of the predicted label and compute the gradient of the loss w.r.t. the input, the magnitude of a pixel’s gradient indicates how important that pixel is for the decision. You can create a 2D attention map by (1) computing the input gradient by minimizing the loss of the predicted label (most confident prediction); (2) taking the absolute values of the gradients; and (3) pick the maximum values across three channels. This method was discussed in [6]. Once you finished the coding, you can run python ./main.py ../data –resume=../models/model best.pth.tar -e -v This command will evaluate your model using your trained model (assuming model best.pth.tar) and visualize the attention maps. All attention maps will saved under ./logs. Again you can use Tensorboard tensorboard –logdir=../logs

Now you will see a tab named “Image”. And you can scroll the slide bar on top of the image to see samples from different batches. You can also zoom in the image by clicking on it. Please include and discuss the visualization in your writeup.

# 4 Writeup

For this assignment, and all other assignments, you must submit a project report in PDF. Every team member should send the same copy of the report. Please clearly identify the contribution of all the team members. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. In the case of this project, we have included detailed instructions for the writeup in each part of the project. You can also discuss anything extra you did. Feel free to add any other information you feel is relevant. A good writeup doesn’t just show results, it tries to draw some conclusions from your experiments.

# 5 Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5%. The folder you hand in must contain the following:

• code/ – directory containing all your code for this assignment
• writeup/ – directory containing your report for this assignment.
• results/ – directory containing your results. Please include your model if you decide to participate in our challenge.

Do not use absolute paths in your code (e.g. /user/classes/proj1). Your code will break if you use absolute paths and you will lose points because of it. Simply use relative paths as the starter code already does. Do not turn in the data / logs / models folder. Hand in your project as a zip file through Canvas. You can create this zip file using python zip submission.py.

# References

• J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
• He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
• Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
• Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
• Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR, 2014.
• Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. In ICLR, 2015.
• Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.

## Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] BMI 826 / CS 838 Homework Assignment 2
30 \$