, , , ,

[SOLVED] Intro to image understanding (csc420) assignment 3

$25

File Name: Intro_to_image_understanding_(csc420)_assignment_3.zip
File Size: 471 KB

5/5 - (1 vote)

Intro to Image Understanding (CSC420) Assignment 3

Due Date: November 8th, 2024, 10:59:00 pm Total: 160 marks

General Instructions:

  • You are allowed to work directly with one other person to discuss the questions. How- ever, you are still expected to write the solutions/code/report in your own words; i.e. no copying. If you choose to work with someone else, you must indicate this in your assignment submission. For example, on the first line of your report file (after your own name and information, and before starting your answer to Q1), you should have a sentence that says: “In solving the questions in this assignment, I worked together with my classmate [name & student number]. I confirm that I have written the solutions/- code/report in my own words”.

  • Your submission should be in the form of an electronic report (PDF), with the answers to the specific questions (each question separately), and a presentation and discussion of your results. For this, please submit a file named report.pdf to MarkUs directly.

  • Submit documented codes that you have written to generate your results separately. Please store all of those files in a folder called assignment3, zip the folder and then submit the file assignment3.zip to MarkUs. You should include a README.txt file (inside the folder) which details how to run the submitted codes.

  • Do not worry if you realize you made a mistake after submitting your zip file; you can submit multiple times on MarkUs.

Part I: Theoretical Problems (55 marks)

[Question 1] reparameterization trick (5 marks)

Briefly (2-3 sentences) explain the purpose of the reparameterization trick in a variational autoencoder.

[Question 2] GAN (5 marks)

In a GAN we have a generator and discriminator. Calculating the loss function for which of them requires a detach()? You answer could be either of the two, neither, or both. Briefly (1-2 lines) justify your answer.

[Question 3] VQ-VAE (5 marks)

Briefly explain what the following line of code does in a vector-quantised variational autoen- coder (VQ-VAE) implementation.

quantized = inputs + (quantized – inputs).detach()

[Question 4] FID (5 marks)

The Fr´echet Inception Distance (FID) score is a metric used to evaluate the quality of images generated by GANs. Find and read a (short) tutorial about FID and briefly (in 3-5 sentences) explain what it measures and how it is computed.

[Question 5] Corner Detection (5 marks)

For corner detection, we defined the Second Moment Matrix as follows:

M = Σ Σ

w(x, y)

I2

IxIy

2

x

y

IxIy I

x y

Let’s denote the 2×2 matrix used in the equation by N ; i.e.:

x

I2 IxIy

N = 2

IxIy Iy

  1. (1 marks) Compute the eigenvalues of N denoted by λ1 and λ2.

  2. (4 marks) Prove that matrix M is positive semi-definite.

[Question 6] Optical Flow (5 marks)

Optical flow is problematic in which of the following conditions? Provide a Yes/No answer and a brief explanation for each case.

  1. (1 mark) In homogeneous image areas.

  2. (1 mark) In textured image areas.

  3. (1 mark) At image edges.

  4. (1 mark) At the boundaries of a moving object.

  5. (1 mark) Corner of a non-moving object.

[Question 7] LSTM (10 marks)

We want to build an LSTM cell that sums its inputs over time. What should the value of the input gate and the forget gate be?

[Question 8] GAN training1 (non-saturating generator cost) (15 marks)

Consider a GAN with generator G(z) and discriminator D(G(z)). The figure blow shows the training losses for two different generator loss functions: J1(G) and J2(G). The blue curve plots the value of J1(G) as a function of D(G(z)). Likewise, the red curve plots the value of J2(G) as a function of D(G(z)). For m generated samples, J1(G) and J2(G) are defined as follows:

J1(G) =  1 Σm [log(D(G(zi)))]

m i=1

m

i=1

J2(G) =  1 Σm [log(1 D(G(zi)))]

  1. (5 mark) Early in the training, is the value of D(G(z)) closer to 0 or closer to 1? Briefly explain why.

  2. (5 mark) Which of the two cost functions would you choose to train your GAN? Briefly justify your answer.

  3. (5 mark) “A GAN is successfully trained when D(G(z)) is close to 1”. Is this statement TRUE or FALSE? Briefly explain your answer.

(You can use insight learned from this question in your implementation tasks.)

1source for this question: https://coursys.sfu.ca/2020sp-cmpt-980-g2/pages/final-questions/view

Part II: Implementation Tasks (105 marks)

TASK I – GAN (40 marks)

In Tutorial I, we saw a simple GAN implementation where both the generator and the discriminator used fully connected layers. Implement a GAN where the generator uses trans- posed convolutional layers and the discriminator uses convolutional layers. Make both of them have 5 layers. In the generator, start the first layer as follows:

nn.ConvTranspose2d(in channels=64, out channels=512, kernel size=4, stride=1, padding=0)

and adjust the parameters of the following 4 transposed conv layers to map a 64-dim latent vector into a 28 × 28 grayscale image. The rest of the generator you can keep similar to that of Tutorial I, i.e. batchnorm and ReLU after the the first 4 transposed convolutional layers and sigmoid after the last.

For the discriminator, start the first layer as follows:

nn.Conv2d(in channels=3, out channels=16, kernel size=4, stride=2, padding=0)

and adjust the parameters of the following 4 conv layers to map a 28 × 28 grayscale image into a 64-dim latent vector.

Train this GAN with MNIST and compare your results with that of the simple GAN in

Tutorial I.

Task II – For this task, you can choose to complete either Task II.a or Task II.b. Do not do both; we will only mark one of them.

Task II.a – WGAN (40 marks)

For this question – and only this question – you are allowed to use AI code generation as much as you want. You can also ask your favourite LLM (e.g. chatGPT) what steps you need to take to modify a GAN implementation into a WGAN. If you do use any AI tools, mention them.

The Wasserstein GAN (or WGAN) is a GAN variant from 2017 that aims to get rid of prob- lems like mode collapse and improve the training stabiliy of GANs.

  1. Modify the code in Tutorial I (or write your own code) to implement a WGAN. Train this WGAN on MNIST and compare your results with that of the simple GAN in Tutorial I.

  2. Briefly explain if/how training this WGAN was different from training your conv GAN in the previous task.

Task II.b – Simple Text-guided Image Generation (40 marks)

In this task, we implement a simple text-guided image generator. To this end,

  1. load a GAN model pre-trained on ImagetNet. For example, you can load the vqgan imagenet f16 16384 model from https://github.com/CompVis/taming-transformers.

  2. freeze the GAN, but allow the random seed vector weights to be trained.

  3. use CLIP to encode a text prompt and also the generated image from the GAN

  4. Choose a loss function that tries to match the CLIP encoding of the prompt with that of the GAN-generated image. Backpropagate the loss through the GAN to update the random seed.

  5. use the updated seed to generate an updated image with the GAN, and backpropagate again and so on.

    Using this method, generate images with the following two prompts:

    • “a dog playing with a cat”, and

    • a prompt that you choose.

      [Task III] Corner Detection (25 marks)

      Download two images (I1 and I2) of the Sandford Fleming Building taken under two different viewing directions:

    • https://commons.wikimedia.org/wiki/File:University College, University of Toronto.jpg

    • https://commons.wikimedia.org/wiki/File:University College Lawn, University of Toronto, Canada.jpg

  1. Calculate the eigenvalues of the Second Moment Matrix (M ) for each pixel of I1 and

    I2.

  2. Show the scatter plot of λ1 and λ2 (where λ1 > λ2) for all the pixels in I1 and the same scatter plot for I2 (5 marks). Each point shown at location (x, y) in the scatter plot, corresponds to a pixel with eigenvalues: λ1 = x and λ2 = y.

  3. Based on the scatter plots, pick a threshold for min(λ1, λ2) to detect corners. Illustrate detected corners on each image using the chosen threshold (2 marks).

  4. Constructing matrix M involves the choice of a window function w(x, y). Often a Gaussian kernel is used. Repeat steps 1, 2, and 3 above, using a significantly dif- ferent Gaussian kernel (i.e. a different σ) than the one used before. For example, choose a σ that is significantly (e.g. 5 times, or 10 times) larger than the previous one (3 marks). Explain how this choice influenced the corner detection in each of the images (5 marks).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Intro to image understanding (csc420) assignment 3
$25