[Solved] TDT4195-Assignment 1

$25

File Name: TDT4195_Assignment_1.zip
File Size: 188.4 KB

SKU: [Solved] TDT4195-Assignment 1 Category: Tag:
5/5 - (1 vote)

This assignment will give you an introduction to basic image processing with python, filtering in the spatial domain, and a simple introduction to building fully-connected neural networks with PyTorch.

Spatial Filtering

Task 1: Theory

A digital image is constructed from an image sensor. An image sensor outputs a continuous voltage waveform that represents the image, and to construct a digital image, we need to convert this continuous signal. This conversion involves two processes: sampling and quantization.

  • [0.1pt] Explain in one sentence what sampling is.
  • [0.1pt] Explain in one sentence what quantization is.
  • [0.2pt] Looking at an image histogram, how can you see that the image has high contrast?
  • [0.5pt] Perform histogram equalization by hand on the 3-bit (8 intensity levels) image in Figure 1a Your report must include all the steps you did to compute the histogram, the transformation, and the transformed image. Round down any resulting pixel intesities that are not integer (use the floor operator).
  • [0.1pt] What happens to the dynamic range if we apply a log transform to an image with a large variance in pixel intensities?

Hint: A log transform is given by s = c log(1 + r), where r and s is the pixel intensity before and after transformation, respectively. c is a constant.

  • [0.5pt] Perform spatial convolution by hand on the image in Figure 1a using the kernel in Figure 1b. The convolved image should be 35. You are free to choose how you handle boundary conditions, and state how you handle them in the report.
6 7 5 4 6 1 0 -1
4 5 7 0 7 2 0 -2
7 1 6 6 3 1 0 -1

(a) A 3 5 image. (b) A 3 3 Sobel kernel.

Figure 1: An image I and a convolutional kernel K. For the image, each square represents an image pixel, where the value inside is the pixel intensity in the [0,7] range (3-bit).

Task 2: Programming

In this task, you can choose to use either the provided python files (task2ab.py, task2c.py) or jupyter notebooks (task2ab.ipynb, task2c.ipynb).

Basic Image Processing

Converting a color image to grayscale representation can be done by taking a weighted average of the three color channels, red (R), green (G), and blue (B). One such weighted average used by the sRGB color space is:

greyi,j = 0.212Ri,j + 0.7152Gi,j + 0.0722Bi,j (1)

Complete the following tasks in python3. Use the functions given in file task2ab.py in the starter code.

NOTE: Do not change the name of the file, the signature of the function, or the type of the returned image in the function. Task 2 will be automatically evaluated, and to ensure that the return output of your function has the correct shape, we have included a set of assertions at the end of the given code. Do not change this.

  • Implement a function that converts an RGB image to greyscale. Use Equation 1. Implement this in the function greyscale.

In your report, include the image lake.jpg as a greyscale image.

  • Implement a function that takes a grayscale image and applies the following intensity transformation T(p) = 1 p. Implement this in the function inverse

In your report, apply the transformation on lake.jpg, and include in your report.

Tip: if the image is in the range [0,255], then the transformation must be changed to T(p) = 255p.

Spatial Convolution

Equation 2 shows two convolutional kernels. ha is a 3 3 sobel kernel. hb is a 5 5 is an approximated gaussian kernel.

(2)

  • Implement a function that takes an RGB image and a convolutional kernel as input, and performs 2D spatial convolution. Assume the size of the kernel is odd numbered, e.g. 3 3, 5 5, or 7 7. You must implement the convolution operation yourself from scratch.

Implement the function in convolve_im.

You are not required to implement a procedure for adding or removing padding (you can return zero in cases when the convolutional kernel goes outside the original image).

In your report, test out the convolution function you made. Convolve the image lake.jpg with the sobel kernel (ha) and the smoothing kernel (hb) in Equation 2. Show both images in your report.

Tip: To convolve a color image, convolve each channel separately and concatenate them afterward.

Neural Networks

Task 3: Theory

A neural network consists of a number of parameters (weights or biases). To train a neural network, we require a cost function (also known as an error function, loss function, or an objective function). A typical cost function for regression problems is the L2 loss.

, (3)

where y is the output of our neural network, and y is the target value of the training example. This cost function is used to optimize our parameters by showing our neural network several training examples with given target values.

To find the direction we want to update our parameters, we use gradient descent. For each training example, we can update each parameter with the following:

, (4)

where is the learning rate, and t is the parameter at time step t.

By using this knowledge, we can derive a typical approach to update our parameters over N training examples.

Algorithm 1 Stochastic Gradient Descent

1: procedure SGD

2: w0 0
3: for n = 0,.,N do
4: xn,yn Select training sample n
5: yn Forward pass xnthrough our network

C

6:

  • A single-layer neural network is a linear function. Give an example of a binary operation that a single-layer neural network cannot represent (either AND, OR, NOT, NOR, NAND, or XOR).
  • [Explain in one sentence what a hyperparameter for a neural network is. Give two examples of a hyperparameter.
  • Why is the softmax activation functioned used in the last layer for neural networks trained to classify objects?
  • Figure 2 shows a simple neural network. Perform a forward pass and backward pass on this network with the given input values. Use Equation 3 as the cost function and let the target value be y = 1.

Find and report the final values for , and.

Explain each step in the computation, such that it is clear how you compute the derivatives.

  • Compute the updated weights w1, w3, and b1 by using gradient descent and the values you found in task d. Use = 0.1

Figure 2: A simple neural network with 4 input nodes, 4 weights, and 2 biases. C is the cost function. To simplify the notation, we write the derivative , etc To clarify notation: a1 = w1 x1, c1 = a1 + a2 + b1, y = max(c1,c2).

Task 4: Programming

In this task, you can choose to use either the provided python files (task4.py) or jupyter notebooks

(task4.ipynb).

In this task, we will develop a model to classify digits from the MNIST dataset. The MNIST dataset consists of 70,000 handwritten digits, split into 10 object classes (the numbers 0-9), where each image is 2828 grayscale (see here for examples). The images are split into two datasets, a training set consisting of 60,000 images, and a testing set consisting of 10,000 images. For this task, we will use the testing set of MNIST as a validation set.

To develop your model, we recommend you to use PyTorch. PyTorch is a high-level framework for developing and training neural networks. PyTorch simplifies developing neural networks, as time-consuming tasks are abstracted away. For example, deriving gradient update rules is done through automatic differentiation.

With this assignment, we provide a starter code to develop your model with PyTorch. This starter code implements a barebone example on how to train a single-layer neural network on MNIST. Also, in the lectures, we will give you an introduction and deeper dive into how PyTorch works.

You can freely use either standard python scripts, or the jupyter notebook we made for you (task4.py or task4.ipynb).

For all tasks, use the hyperparameters given in the notebook/python script, except if stated otherwise in the subtask. Use a batch size of 64, learning rate of 0.0192, and train the network for 5 epochs.

  • Use the given starter code and train a single-layer neural network with batch size of 64.

Then, normalize every image between a range of [-1. 1], and train the network again.

Plot the training and validation loss from both of the networks in the same graph. Include the graph in your report. Do you notice any difference when training your network with/without normalization?

Tip: You can normalize the image to the range of [-1, 1] by using an image transform. Use torchvision.transforms.Normalize with mean = 0.5, and std = 0.5, and include it after transforms.ToTensor().

From this task, use normalization for every subsequent task.

  • The trained neural network will have one weight with shape [num classes,28 28]. To visualize the learned weight, we can plot the weight as a 28 28 grayscale image.

For each digit (0-9), plot the learned weight as a 28 28 image. In your report, include the image for each weight, and describe what you observe (1-2 sentences).

Tip: You can access the weight of the fully connected layer by using the following snippet: weight = list(model.children())[1].weight.cpu().data

  • Set the learning rate to lr = 1.0, and train the network from scratch.

Report the accuracy and average cross entropy loss on the validation set. In 1-2 sentences, explain why the network achieves worse/better accuracy than previously.

Tip: To observe what is happening to the loss, you should change the plt.ylim argument.

  • Include an hidden layer with 64 nodes in the network, with ReLU as the activation function for the first layer. Train this network with the same hyperparameters as previously.

Plot the training and validation loss from this network together with the loss from task (a). Include the plot in your report. What do you observe?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] TDT4195-Assignment 1[Solved] TDT4195-Assignment 1
$25