This part of the assignment makes use of Convolutional Neural Networks (CNN). The previous part makes use of hand-crafted features like SIFT to represent images, then trains a classifier on top of them. In this way, learning is a two-step procedure with image representation and learning. The method used here instead learns the features jointly with the classification. Training CNNs roughly consists of three parts: (i) Creating the network architecture, (ii) Reprocessing the data, (iii) Feeding the data to the network, and updating the parameters. Please follow the instruction and finish the below tasks. (Note: You do not need to strictly follow the structure/functions of the provided script.)
1 Session 1: Image Classification on CIFAR-10
1.1 Installation
First of all, you need to install PyTorch and relevant packages. In this session, we will use CIFAR-10 as the training and testing dataset.
CIFAR-10 (3-pts)
The relevant script is provided in Lab project part2.pynb. You need to run and modify the given code and show the example images of CIFAR-10, describe the classes and images of CIFAR-10. (Please visualize at least one picture for each class.)
1.2 Architecture understanding
In this section, we provide two wrapped classes of architectures defined by nn.Module. One is an ordinary two-layer network (TwolayerNet) with fully connected layers and ReLu, and the other is a Convolutional Network (ConvNet) utilizing the structure of LeNet-5[2].
Architectures (5-pts)
- Complement the architecture of TwolayerNet class, and complement the architecture of ConvNet class using the structure of LeNet-5[2]. (3-pts)
- Since you need to feed color images into these two networks, whats the kernel size of the first convolutional layer in ConvNet? and how many trainable parameters are there in F6 layer (given the calculation process)? (2-pts)
1.3 Preparation of training
In above section, we use the CIFAR10 dataset class from torchvision.utils provided by PyTorch. Whereas in most cases, you need to prepare the dataset yourself. One of the ways is to create a dataset class yourself and then use the DataLoader to make it iterable. After preparing the training and testing data, you also need to define the transform function for data augmentation and optimizer for parameter updating.
1.4 Setting up the hyperparameters
Some parameters must be set properly before the training of CNNs. These parameters shape the training procedure. They determine how many images are to be processed at each step, how much the weights of the network will be updated, how many iterations will the network run until convergence. These parameters are called hyperparameters in the machine learning literature.
Hyperparameter Optimization and Evaluation (10-pts)
- Play with ConvNet and TwolayerNet yourself, set up the hyperparameters, and reach the accuracy as high as you can. You can modify the train, Dataloader, transform and Optimizer function as you like.
- You can also modify the architectures of these two Nets. Lets add 2 more layers in TwolayerNet and ConvNet, and show the results. (You can decide the size of these layers and where to add them.) Will you get higher performances? explain why.
- Show the final results and described what youve done to improve the results. Describe and explain the influence of hyperparameters among TwolayerNet and ConvNet.
- Compare and explain the differences of these two networks regarding the architecture, performances, and learning rates.
Hint
You can adjust the following parameters and other parameters not listed as you like: Learning rate, Batch size, Number of epochs, optimizer, transform function, Weight decay etc. You can also change the structure a bit, for instance, adding Batch Normalization[4] layers. Please do not use external well-defined networks and please do not add more than 3 additional (beyond the original network) convolutional layers.
2 Session 2: Fine-tuning the ConvNet
In the previous session, the above-implemented network (ConvNet) is trained on a dataset named CIFAR-10, which contains the images of 10 different object categories. The size of each image is 32 32 3. In this session, we will use a subset of STL-10 with larger sizes and different object classes. Consequently, there is a discrepancy between the dataset we used to train (CIFAR-10) and the new dataset (STL-10). One of the solutions is to train the whole network from scratch. However, the number of parameters is too large to be trained properly with such few numbers of images provided from STL-10. Another solution is to shift the learned weights in a way to perform well on the test set, while preserving as much information as necessary from the training class. This procedure is called transfer learning and has been widely used in the literature. Fine-tuning is often used in such circumstances, where the weights of the pre-trained network change gradually. One of the ways of fine-tuning is to use the same architectures in all layers except the output layer, as the number of output classes changes (from 10 to 5).
2.1 STL-10 Dataset
2.2 Fine-tuning ConvNet
In this case, you need to modify the output layer of pre-trained ConvNet module from 10 to 5. In this way, you can either load the pre-trained parameters and then modify the output layer or change the output layer firstly and then load the matched pre-trained parameters. You can find the examples from https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html and https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html.
2.3 Bonus (optional)
References
- LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
- LeCun, Yann, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86.11 (1998): 2278-2324.
- Sharif Razavian, Ali, et al. CNN features off-the-shelf: an astounding baseline for recognition. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014.
- Ioffe, Sergey, and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning. PMLR, 2015.
Reviews
There are no reviews yet.