[Solved] CSCI5561 Homework 4-Neutral Network

$25

File Name: CSCI5561_Homework_4-Neutral_Network.zip
File Size: 329.7 KB

SKU: [Solved] CSCI5561 Homework 4-Neutral Network Category: Tag:
5/5 - (1 vote)

Figure 1: You will implement (1) a multi-layer perceptron (neural network) and (2) convolutiona neural network to recognize hand-written digit using the MNIST dataset.

The goal of this assignment is to implement neural network to recognize hand-written digits in the MNIST data.

MNIST Data You will use the MNIST hand written digit dataset to perform the first task (neural network). We reduce the image size (28 28 14 14) and subsample the data. You can download the training and testing data from here: http://www.cs.umn.edu/~hspark/csci5561/ReducedMNIST.zip

Description: The zip file includes two MAT files (mnist_train.mat and mnist_test.mat). Each file includes im_* and label_* variables:

  • im_* is a matrix (196 n) storing vectorized image data (196 = 14 14)
  • label_* is n 1 vector storing the label for each image data.

n is the number of images. You can visualize the ith image, e.g., imshow(uint8(reshape(im_train(:,i), [14,14]))).

2 Single-layer Linear Perceptron

Figure 2: You will implement a single linear perceptron that produces accuracy near 30%. Random chance is 10% on testing data.

You will implement a single-layer linear perceptron (Figure 2(a)) with stochastic gradient descent method. We provide main_slp_linear where you will implement GetMiniBatch and TrainSLP_linear.

function [mini_batch_x, mini_batch_y] = GetMiniBatch(im_train, label_train, batch_size)

Input: im_train and label_train are a set of images and labels, and batch_size is the size of the mini-batch for stochastic gradient descent.

Output: mini_batch_x and mini_batch_y are cells that contain a set of batches (images and labels, respectively). Each batch of images is a matrix with size 194batch_size, and each batch of labels is a matrix with size 10batch_size (one-hot encoding). Note that the number of images in the last batch may be smaller than batch_size. Description: You may randomly permute the the order of images when building the batch, and whole sets of mini_batch_* must span all training data.

function y = FC(x, w, b)

Input: xRm is the input to the fully connected layer, and wRnm and bRn are the weights and bias.

Output: yRn is the output of the linear transform (fully connected layer). Description: FC is a linear transform of x, i.e., y = wx + b.

function [dLdx dLdw dLdb] = FC_backward(dLdy, x, w, b, y)

Input: dLdyR1n is the loss derivative with respect to the output y.

Output: dLdxR1m is the loss derivative with respect the input x, dLdwR1(nm) is the loss derivative with respect to the weights, and dLdbR1n is the loss derivative with respec to the bias.

Description: The partial derivatives w.r.t. input, weights, and bias will be computed. dLdx will be back-propagated, and dLdw and dLdb will be used to update the weights and bias.

function [L, dLdy] = Loss_euclidean(y_tilde, y)

Input: y_tildeRm is the prediction, and y 0,1m is the ground truth label.

Output: LR is the loss, and dLdy is the loss derivative with respect to the prediction. Description: Loss_euclidean measure Euclidean distance L = ky yk2. e

function [w, b] = TrainSLP_linear(mini_batch_x, mini_batch_y)

Input: mini_batch_x and mini_batch_y are cells where each cell is a batch of images and labels.

Output: wR10196 and bR101 are the trained weights and bias of a single-layer perceptron.

Description: You will use FC, FC_backward, and Loss_euclidean to train a singlelayer perceptron using a stochastic gradient descent method where a pseudo-code can be found below. Through training, you are expected to see reduction of loss as shown in Figure 2(b). As a result of training, the network should produce more than 25% of accuracy on the testing data (Figure 2(c)).

Algorithm 1 Stochastic Gradient Descent based Training

1: Set the learning rate

2: Set the decay rate (0,1]

3: Initialize the weights with a Gaussian noise w N(0,1)

4: k = 1

5: for iIter = 1 : nIters do

6: At every 1000th iteration,

7: 0 and

8: for Each image xi in kth mini-batch do

9: Label prediction of xi

10: Loss computation l

11: Gradient back-propagation of using back-propagation.

12: w w and

13: end for

14: k++ (Set k = 1 if k is greater than the number of mini-batches.)

15:

3 Single-layer Perceptron

Figure 3: You will implement a single perceptron that produces accuracy near 90% on testing data.

You will implement a single-layer perceptron with soft-max cross-entropy using stochastic gradient descent method. We provide main_slp where you will implement TrainSLP. Unlike the single-layer linear perceptron, it has a soft-max layer that approximates a max function by clamping the output to [0,1] range as shown in Figure 3(a).

function [L, dLdy] = Loss_cross_entropy_softmax(x, y)

Input: xRm is the input to the soft-max, and y 0,1m is the ground truth label.

Output: LR is the loss, and dLdy is the loss derivative with respect to x.

Description: Loss_cross_entropy_softmax measure cross-entropy between two distributionswhere yei is the soft-max output that approximates the max operation by clamping x to [0,1] range:

,

where xi is the ith element of x.

function [w, b] = TrainSLP(mini_batch_x, mini_batch_y)

Output: wR10196 and bR101 are the trained weights and bias of a single-layer perceptron.

Description: You will use the following functions to train a single-layer perceptron using a stochastic gradient descent method: FC, FC_backward, Loss_cross_entropy_softmax

Through training, you are expected to see reduction of loss as shown in Figure 3(b). As a result of training, the network should produce more than 85% of accuracy on the testing data (Figure 3(c)).

4 Multi-layer Perceptron

Figure 4: You will implement a multi-layer perceptron that produces accuracy more than 90% on testing data.

You will implement a multi-layer perceptron with a single hidden layer using a stochastic gradient descent method. We provide main_mlp. The hidden layer is composed of 30 units as shown in Figure 4(a).

function [y] = ReLu(x)

Input: x is a general tensor, matrix, and vector.

Output: y is the output of the Rectified Linear Unit (ReLu) with the same input size. Description: ReLu is an activation unit (yi = max(0,xi)). In some case, it is possible to use a Leaky ReLu (y) where

function [dLdx] = ReLu_backward(dLdy, x, y)

Input: dLdyR1z is the loss derivative with respect to the output y Rz where z is the size of input (it can be tensor, matrix, and vector).

Output: dLdxR1z is the loss derivative with respect to the input x.

function [w1, b1, w2, b2] = TrainMLP(mini_batch_x, mini_batch_y)

Output: w1 R30196, b1 R301, w2 R1030, b2 R101 are the trained weights and biases of a multi-layer perceptron.

Description: You will use the following functions to train a multi-layer perceptron using a stochastic gradient descent method: FC, FC_backward, ReLu, ReLu_backward, Loss_cross_entropy_softmax. As a result of training, the network should produce more than 90% of accuracy on the testing data (Figure 4(b)).

5 Convolutional Neural Network

Input Conv (3) ReLu Pool (22) Flatten FC Soft-max Accuracy: 0.947251

(a) CNN (b) Confusion

Figure 5: You will implement a convolutional neural network that produces accuracy more than 92% on testing data.

You will implement a convolutional neural network (CNN) using a stochastic gradient descent method. We provide main_cnn. As shown in Figure 4(a), the network is composed of: a single channel input (14141) Conv layer (33 convolution with 3 channel output and stride 1) ReLu layer Max-pooling layer (2 2 with stride 2) Flattening layer (147 units) FC layer (10 units) Soft-max. function [y] = Conv(x, w_conv, b_conv)

Input: xRHWC1 is an input to the convolutional operation, w_convRHWC1C2 and b_convRC2 are weights and bias of the convolutional operation.

Output: yRHWC2 is the output of the convolutional operation. Note that to get the same size with the input, you may pad zero at the boundary of the input image. Description: This convolutional operation can be simplified using MATLAB built-in function im2col.

function [dLdw, dLdb] = Conv_backward(dLdy, x, w_conv, b_conv, y) Input: dLdy is the loss derivative with respec to y.

Output: dLdw and dLdb are the loss derivatives with respect to convolutional weights and bias w and b, respectively.

Description: This convolutional operation can be simplified using MATLAB built-in function im2col. Note that for the single convolutional layer, is not needed.

function [y] = Pool2x2(x)

Input: xRHWC is a general tensor and matrix.

Output: yRH2 W2 C is the output of the 22 max-pooling operation with stride 2. function [dLdx] = Pool2x2_backward(dLdy, x, y) Input: dLdy is the loss derivative with respect to the output y. Output: dLdx is the loss derivative with respect to the input x.

function [y] = Flattening(x) Input: xRHWC is a tensor.

Output: yRHWC is the vectorized tensor (column major).

function [dLdx] = Flattening_backward(dLdy, x, y) Input: dLdy is the loss derivative with respect to the output y. Output: dLdx is the loss derivative with respect to the input x.

function [w_conv, b_conv, w_fc, b_fc] = TrainCNN(mini_batch_x, mini_batch_y) Output: w_conv R3313, b_conv R3, w_fc R10147, b_fc R147 are the trained weights and biases of the CNN.

Description: You will use the following functions to train a convolutional neural network using a stochastic gradient descent method: Conv, Conv_backward, Pool2x2,

Pool2x2_backward, Flattening, Flattening_backward, FC, FC_backward, ReLu, ReLu_backward, Loss_cross_entropy_softmax. As a result of training, the network should produce more than 92% of accuracy on the testing data (Figure 5(b)).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSCI5561 Homework 4-Neutral Network
$25