[Solved] CSC411 Assignment 4- AlexNet

$25

File Name: CSC411_Assignment_4-_AlexNet.zip
File Size: 263.76 KB

SKU: [Solved] CSC411 Assignment 4- AlexNet Category: Tag:
5/5 - (1 vote)
  1. AlexNet For this question, you will first read the following paper:
  2. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2012.

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

This is a highly influential paper (over 45,000 citations on Google Scholar!) because it was one of the first papers to demonstrate impressive performance for a neural network on a modern computer vision benchmark. It generated lots of excitement both in academia and in the tech industry. The architecture presented in this paper widely used today, and is known as AlexNet, after the first author. Reading this paper will also help you review a lot of the important concepts from this class.

  • They use a conv net architecture which has five convolution layers and three fully connected layers (one of which is the output layer). Your job is to count the number of units, the number of weights, and the number of connections in each layer. I.e., you should complete the following table:
# Units # Weights # Connections
Convolution Layer 1Convolution Layer 2Convolution Layer 3Convolution Layer 4Convolution Layer 5 Fully Connected Layer 1Fully Connected Layer 2 Output Layer

You can ignore the pooling layers when doing these calculations, i.e. you dont need to consider the units in the pooling layers or the connections between convolution and pooling layers. You can also ignore the biases. Note that the paper gives you the answers for the numbers of units in the caption to Figure 2. Therefore, we wont mark the column for units, though you would benefit from trying to work it out yourself.

When counting the number of connections, well adopt the convention that when the input to a convolution layer is zero-padded, the connections to the dummy zero values count towards the total. (This is the most convenient way to do it, since it means the number of incoming connections is the same for each unit in a given layer.)

  • Now suppose youre working at a software company and want to use an architecture similar to AlexNet in a product. Your project manager gives you some additional instructions; for each of the following scenarios, based on your answers to Part 1, suggest a change to the architecture which will help achieve the desired objective. I.e., modify the sizes of one or more layers. (These scenarios are independent.)
  1. You want to reduce the memory usage at test time so that the network can be run on a cell phone; this requires reducing the number of parameters for the network. ii. Your network will need to make very rapid predictions at test time. You want to reduce the number of connections, since there is approximately one add-multiply operation per connection.
  1. Gaussian Nave Bayes. In this question, you will derive the maximum likelihood estimates for Gaussian Nave Bayes, which is just like the nave Bayes model from lecture, except that the features are continuous, and the conditional distribution of each feature given the class is (univariate) Gaussian rather than Bernoulli. Start with the following generative model for a discrete class label y (1,2,,k) and a real valued vector of d features x =

(x1,x2,,xd): p(y = k) = k (1)

/

where k is the prior on class k, i2 are the variances for each feature, which are shared between all classes, and ki is the mean of the feature i conditioned on class k. We write to represent the vector with elements k and similarly is the vector of variances. The matrix of class means is written where the kth row of is the mean for class k.

  • Use Bayes rule to derive an expression for p(y = k|x,,). Hint: Use the law of total probability to derive an expression for p(x|,).
  • Write down an expression for the negative likelihood function (NLL)

`(;D) = logp(y(1),x(1),y(2),x(2), ,y(N),x(N)|) (3)

of a particular dataset D = {(y(1),x(1)),(y(2),x(2)), ,(y(N),x(N))} with parameters = {,,}. (Assume the data are i.i.d.)

  • Take partial derivatives of the likelihood with respect to each of the parameters ki and with respect to the shared variances i2. Based on this, find the maximum likelihood estimates for and . You may assume that each class appears at least once in the dataset.
  • Show that the MLE for k is given by the following equation:

] (4)

You may assume that each class appears at least once. You will find it helpful to read about Lagrange multipliers[1].

[1] https://en.wikipedia.org/wiki/Lagrange_multiplier

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSC411 Assignment 4- AlexNet
$25