, ,

[SOLVED] (csc420) assignment 1 to 4 solutions

$25

File Name: _csc420__assignment_1_to_4_solutions.zip
File Size: 339.12 KB

5/5 - (1 vote)

[1.a] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below:
x[n] = (
1 −3 ≤ n ≤ 3
0 otherwise
h[n] = (
1 −2 ≤ n ≤ 2
0 otherwise
(1)
[1.b] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below:
x[n] = (
1 −3 ≤ n ≤ 3
0 otherwise
h[n] = (
2 − |x| −2 ≤ n ≤ 2
0 otherwise
(2)We define a system as something that takes an input signal, e.g. x(n), and produces an
output signal, e.g. y(n). Linear Time-Invariant (LTI) systems are a class of systems that
are both linear and time-invariant. In linear systems, the output for a linear combination of
inputs is equal to the linear combination of individual responses to those inputs. In other
words, for a system T, signals x1(n) and x2(n), and scalars a1 and a2, system T is linear if
and only if:
T[a1x1(n) + a2x2(n)] = a1T[x1(n)] + a2T[x2(n)]Also, a system is time-invariant if a shift in its input merely shifts the output; i.e. If T[x(n)] =
y(n), system T is time-invariant if and only if:
T[x(n − n0)] = y(n − n0)[2.a] (5 marks) Consider a discrete linear time-invariant system T with discrete input signal
x(n) and impulse response h(n). Recall that the impulse response of a discrete system
is defined as the output of the system when the input is an impulse function δ(n), i.e.
T[δ(n)] = h(n), where:
δ(n) = (
1, if n = 0,
0, else.
Prove that T[x(n)] = h(n) ∗ x(n), where ∗ denotes convolution operation.
Hint: represent signal x(n) as a function of δ(n).[2.b] (5 marks) Is Gaussian blurring linear? Is it time-invariant? Make sure to include your
justifications.
[2.c] (5 marks) Is time reversal, i.e. T[x(n)] = x(−n), linear? Is it time-invariant? Make
sure to include your justifications.Vectors can be used to represent polynomials. For example, 3rd-degree polynomial (a3x
3 +
a2x
2 + a1x + a0) can by represented by vector [a3, a2, a1, a0].
If u and v are vectors of polynomial coefficients, prove that convolving them is equivalent to
multiplying the two polynomials they each represent.
Hint: You need to assume proper zero-padding to support the full-size convolution.The Laplace operator is a second-order differential operator in the “n”-dimensional Euclidean
space, defined as the divergence (∇.) of the gradient (∇f). Thus if f is a twice-differentiable
real-valued function, then the Laplacian of f is defined by:
∆f = ∇2
f = ∇ · ∇f =
Xn
i=1

2
f
∂x2
i
where the latter notations derive from formally writing:
∇ =


∂x1
, . . . ,

∂xn

.Now, consider a 2D image I(x, y) and its Laplacian, given by ∆I = Ixx+Iyy. Here the second
partial derivatives are taken with respect to the directions of the variables x, y associated
with the image grid for convenience. Show that the Laplacian is in fact rotation invariant.In other words, show that ∆I = Irr + Ir
′r
′, where r and r
′ are any two orthogonal directions.Hint: Start by using polar coordinates to describe a chosen location (x, y). Then use the
chain rule.Using the sample code provided in Tutorial 2, examine the sensitivity of the Canny edge
detector to Gaussian noise. To do so, take an image of your choice, and add i.i.d Gaussian
noise to each pixel. Analyze the performance of the edge detector as a function of noise variance. Include your observations and three sample outputs (corresponding to low, medium,
and high noise variances) in the report.In this question, the goal is to implement a rudimentary edge detection process that uses a
derivative of Gaussian, through a series of steps. For each step (excluding step 1) you are
supposed to test your implementation on the provided image, and also on one image of your
own choice. Include the results in your report.Step I – Gaussian Blurring (10 marks): Implement a function that returns a 2D Gaussian matrix for input size and scale σ. Please note that you should not use any of the
existing libraries to create the filter, e.g. cv2.getGaussianKernel(). Moreover, visualize this2D Gaussian matrix for two choices of σ with appropriate filter sizes. For the visualization,
you may consider a 2D image with a colormap, or a 3D graph. Make sure to include the
color bar or axis values.Step II – Gradient Magnitude (10 marks): In the lectures, we discussed how partial
derivatives of an image are computed. We know that the edges in an image are from the
sudden changes of intensity and one way to capture that sudden change is to calculate the
gradient magnitude at each pixel. The edge strength or gradient magnitude is defined as:
g(x, y) = |∇f(x, y)| =
q
g
2
x + g
2
y
where gx and gy are the gradients of image f(x, y) along x and y-axis direction respectively.Using the Sobel operator, gx and gy can be computed as:
gx =


−1 0 1
−2 0 2
−1 0 1

 ∗ f(x, y) and gy =


−1 −2 −1
0 0 0
1 2 1

 ∗ f(x, y)Implement a function that receives an image f(x, y) as input and returns its gradient g(x, y)
magnitude as output using the Sobel operator. You are supposed to implement the convolution required for this task from scratch, without using any existing libraries.Step III – Threshold Algorithm (20 marks): After finding the image gradient, the
next step is to automatically find a threshold value so that edges can be determined. One
algorithm to automatically determine image-dependent threshold is as follows:
1. Let the initial threshold τ0 be equal to the average intensity of gradient image g(x, y),
as defined below:
τ0 =
Ph
j=1
Pw
i=1 g(i, j)
h × w
where h and w are the height and width of the image under consideration.2. Set iteration index i = 0, and categorize the pixels into two classes, where the lower
class consists of the pixels whose gradient magnitudes are less than τ0, and the upper
class contains the rest of the pixels.3. Compute the average gradient magnitudes mL and mH of lower and upper classes,
respectively.4. Set iteration i = i + 1 and update threshold value as:
τi =
mL + mH
25. Repeat steps 2 to 4 until |τi − τi−1| ≤ ϵ is satisfied, where ϵ → 0; take τi as final
threshold and denote it by τ .
Once the final threshold is obtained, each pixel of gradient image g(x, y) is compared
with τ . The pixels with a gradient higher than τ are considered as edge point and
is represented as white pixel; otherwise, it is designated as black. The edge-mapped
image E(x, y), thus obtained is:
E(x, y) = (
255, if g(x, y) ≥ τ
0, otherwise
Implement the aforementioned threshold algorithm. The input to this algorithm is the gradient image g(x, y) obtained from step II, and the output is a black and white edge-mapped
image E(x, y).Step IV – Test (10 marks): Use the image provided along with this assignment, and also
one image of your choice to test all the previous steps (I to III) and to visualize your results
in the report. Convert the images to grayscale first.Please note that the input to each step
is the output of the previous step. In a brief paragraph, discuss how the algorithm works for
these two examples and highlight its strengths and/or its weaknesses.In Gaussian pyramids, the image at each level Ik is constructed by blurring the image at
the previous level Ik−1 and downsampling it by a factor of 2. A Laplacian pyramid, on the
other hand, consists of the difference between the image at each level (Ik) and the upsampled
version of the image in the next level of the Gaussian pyramid (Ik+1).Given an image of size 2n × 2
n denoted by I0, and its Laplacian pyramid representation
denoted by L0, …, Ln−1, show how we can reconstruct the original image, using the minimum
information from the Gaussian pyramid. Specify the minimum information required from
the Gaussian pyramid and a closed-form expression for reconstructing I0.
Hint: The reconstruction follows a recursive process; What is the base case that contains
the minimum information?Show that in a fully connected neural network with linear activation functions, the number of layers has effectively no impact on the network.
Hint: Express the output of a network as a function of its inputs and its weights of layers.Consider a neural network that represents the following function:
yˆ = σ(w5σ(w1x1 + w2x2) + w6σ(w3x3 + w4x4))
where xi denotes input variables, ˆy is the output variable, and σ is the logistic function:
σ(x) = 1
1 + e
−x
.Suppose the loss function used for training this neural network is the L2 loss, i.e. L(y, yˆ) =
(y − yˆ)
2
. Assume that the network has its weights set as:
(w1, w2, w3, w4, w5, w6) = (−0.65, −0.55, 1.74, 0.79, −0.13, 0.93)[3.a] (5 marks) Draw the computational graph for this function. Define appropriate intermediate variables on the computational graph.
[3.b] (5 marks) Given an input data point (x1, x2, x3, x4) = (1.2, −1.1, 0.8, 0.7) with true
label of 1.0, compute the partial derivative ∂L
w3
, by using the back-propagation algorithm.Indicate the partial derivatives of your intermediate variables on the computational graph.
Round all your calculations to 4 decimal places.
Hint: For any vector (or scalar) x, we have ∂
∂x
(||x||2
2
) = 2x. Also, you do not need to write
any code for this question! You can do it by hand.In this problem, our goal is to estimate the computation overhead of CNNs by counting
the FLOPs (floating point operations). Consider a convolutional layer C followed by a max
pooling layer P. The input of layer C has 50 channels, each of which is of size 12×12. Layer
C has 20 filters, each of which is of size 4 × 4. The convolution padding is 1 and the stride is
2. Layer P performs max pooling over each of the C’s output feature maps, with 3 × 3 local
receptive fields, and stride 1.Given scalar inputs x1, x2, …, xn, we assume:
• A scalar multiplication xi
.xj accounts for one FLOP.
• A scalar addition xi + xj accounts for one FLOP.• A max operation max(x1, x2, …, xn) accounts for n − 1 FLOPs.
• All other operations do not account for FLOPs.
How many FLOPs layer C and P conduct in total during one forward pass, with and without
accounting for bias?The following CNN architecture is one of the most influential architectures that was presented in the 90s. Count the total number of trainable parameters in this network. Note
that the Gaussian connections in the output layer can be treated as a fully connected layer
similar to F6.For backpropagation in a neural network with logistic activation function, show that, in
order to compute the gradients, as long as we have the outputs of the neurons, there is no
need for the inputs.
Hint: Find the derivative of a neuron’s output with respect to its inputs.One alternative to the logistic activation function is the hyperbolic tangent function:
tanh(x) = 1 − e
−2x
1 + e
−2x
.
• (a) What is the output range for this function, and how it differs from the output range
of the logistic function?
• (b) Show that its gradient can be formulated as a function of logistic function.
• (c) When do we want to use each of these activation functions?In this question, we train (or fine-tune) a few different neural network models to classify
dog breeds. We also investigate their dataset bias and cross-dataset performances. All the
tasks should be implemented using Python with a deep learning package of your choice, e.g.
PyTorch or TensorFlow.We use two datasets in this assignment.
1. Stanford Dogs Dataset
2. Dog Breed ImagesThe Stanford Dogs Dataset (SDD) contains over 20,000 images of 120 different dog breeds.
The annotations available for this dataset include class labels (i.e. dog breed name) and
bounding boxes. In this assignment, we’ll only be using the class labels. Further, we will
only use a small portion of the dataset (as described below) so you can train your models on
Colab. Dog Breed Images (DBI) is a smaller dataset containing images of 10 different dog
breeds.To prepare the data for the implementation tasks, follow these steps:
1- Download both datasets and unzip them. There are 7 dog breeds that appear in both
datasets:
• Bernese mountain dog
• Border collie
• Chihuahua
• Golden retriever
• Labrador retriever
• Pug
• Siberian husky2- Delete the folders associated with the remaining dog breeds in both datasets. You can
also delete the folders associated with the bounding boxes in the SDD.3- For the 7 breeds that are present in both datasets, the names might be written slightly
differently (e.g. Labrador Retriever vs. Labrador). Manually rename the folders so the
names match (e.g. make them both labrador retriever ).4- Rename the folders to indicate that they are subsets of the original datasets (to avoid
potential confusion if you later want to use them for another project). For example, SDDsubset and DBIsubset. Each of these should now contain 7 subfolders (e.g. border collie, pug,
etc.) and the names should match.5- Zip the two folders (e.g. SDDsubset.zip and DBIsubset.zip) and upload them to your
Google Drive.You can find sample code working with the SDD on the internet. If you want, you are
welcome to look at these examples (particularly the one linked here) and use them as your
starting code or use code snippets from them. You will need to modify the code as our questions are asking you to do different tasks, which are not the same as the ones in these online
examples. But using and copying code snippets from these resources is fine. If you choose
to use this (or any other online example) as your starting code, please acknowledge them in
your submission. We also suggest that before starting to modify the starting code, you run
them as is on your data (e.g. DBIsubset) to 1) make sure your dataset setup is correct and
2) to make sure you fully understand the starter code before you start modifying it.Look at the images in both datasets, and briefly explain if you observe any systematic differences between images in one dataset vs. the other.Construct a simple convolutional neural network (CNN) for classifying the images in SDD.
For example, you can construct a network as follow:
• convolutional layer – 16 filters of size 3×3
• batch normalization
• convolutional layer – 16 filters of size 3×3
• max pooling (2×2)
• convolutional layer – 8 filters of size 3×3
• batch normalization
• convolutional layer – 8 filters of size 3×3
• max pooling (2×2)
• dropout (e.g. 0.5)
• fully connected (32)
• dropout (0.5)
• softmaxIf you want, you can change these specifications; but if you do so, please specify them in
your submission. Use RELU as your activation function, and cross-entropy as your cost function. Train the model with the optimizer of your choice, e.g., SGD, Adam, RMSProp, etc.Use random cropping, random horizontal flipping, and random rotations for augmentation.
Make sure to tune the parameters of your optimizer for getting the best performance on the
validation set.
Plot the training, and test accuracy over the first 10 epochs. Note that the accuracy isdifferent from the loss function; the accuracy is defined as the percentage of images classified
correctly.Train the same CNN model again; this time, with dropout. Plot the training and test accuracy over the first 10 epochs; and compare them with the model trained without dropout.
Report the impact of dropout on the training and its generalization to the test set.[III.a] (15 marks) ResNet models were proposed in the “Deep Residual Learning for Image
Recognition” paper. These models have had great success in image recognition on benchmark
datasets. In this task, we use the ResNet-18 model for the classification of the images in the
DBI dataset. To do so, use the ResNet-18 model from PyTorch, modify the input/output
layers to match your dataset, and train the model from scratch; i.e., do not use the pretrained ResNet. Plot the training, validation, and testing accuracy, and compare those with
the results of your CNN model.[III.b] (10 marks) Run the trained model on the entire SDD dataset and report the accuracy.
Compare the accuracy obtained on the (test set of) DBI, vs. the accuracy obtained on the
SDD. Which is higher? Why do you think that might be? Explain very briefly, in one or two
sentences.Similar to the previous task, use the following three models from PyTorch: ResNet18,
ResNet34, and ResNeXt32. This time you are supposed to use the pre-trained models and
fine-tune the input/output layers on DBI training data. Report the accuracy of these finetuned models on DBI test dataset, and also the entire SDD dataset. Discuss the crossperformance of these trained models. For example, are there cases in which two different
models perform equally well on the test portion of the DBI but have significant performance
differences when evaluated on the SDD?Train a model that – instead of classifying dog breeds – can distinguish whether a given
image is more likely to belong to SDD or DBI. To do so, first, you need to divide your
data into training and test data (and possibly validation if you need those for tuning the
hyperparameters of your model).You need to either reorganize the datasets (to load the
images using torchvision.datasets.ImageFolder ) or write your own data loader function. Train
your model on the training portion of the dataset. Include your network model specifications
in the report, and make sure to include your justifications for that choice. Report your
model’s accuracy on the test portion of the dataset.The Laplacian of Gaussian operator is defined as:
∇2G(x, y, σ) = ∂
2G(x, y, σ)
∂x2
+

2G(x, y, σ)
∂y2
=
1
πσ4

x
2 + y
2

2
− 1

e

x
2+y
2
2σ2
,
where the Gaussian filter G is:
G(x, y, σ) = 1
2πσ2
e

x
2+y
2
2σ2The characteristic scale is defined as the scale that produces the peak value (minimum or
maximum) of the Laplacian response.1. (10 marks) What scale (i.e. what value of σ) maximises the magnitude of the response of the Laplacian filter to an image of a black circle with diameter D on a white
background? Justify your answer.2. (5 marks) What scale should we use if we want to instead detect a white circle of the
same size on a black background?3. (10 marks) Experimentally find the value of σ that maximizes the magnitude of the response for a black square of size 100×100 pixels on a sufficiently large white background.
Hint: You can simply implement this task and automatically test for a large set of
samples. You may also want to first generate the samples in log-domain to accurately
locate the optimal value in a large spectrum.For corner detection, we defined the Second Moment Matrix as follows:
M =
X
x
X
y
w(x, y)

I
2
x
IxIy
IxIy I
2
y

Let’s denote the 2×2 matrix used in the equation by N; i.e.:
N =

I
2
x
IxIy
IxIy I
2
y

1. (10 marks) Compute the eigenvalues of N denoted by λ1 and λ2?
2. (15 marks) Prove that matrix M is positive semi-definite.The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision
and image processing for the purpose of object detection. The technique counts occurrences
of gradient orientation in localized portions of an image. This method is similar to that of
scale-invariant feature transform (SIFT) descriptors, and shape contexts (a similar technique
we have not seen in class), but differs in the sense that it is computed on a dense grid
of uniformly spaced cells and uses overlapping local contrast normalization for improved
accuracy. Until deep learning, HOG was one of the long-standing top representations for
object detection.In this assignment, you will implement a variant of HOG. Given an input image, your
algorithm will compute the HOG feature and visualize it as shown in Figure 1 (the line
directions are perpendicular to the gradient to show edge alignment).
Figure 1: HOG features plotted on an example image.The orientation and magnitude of the red lines represent the gradient components in a
local cell. A HOG descriptor is formed at a specified image location as follows:
1. Compute image gradient magnitudes and directions over the whole image, thresholding
small gradient magnitudes to zero. You should empirically set a reasonable value for
the threshold for each of the input images.2. Center a cell grid (m × n) on the image. To create this grid cell, assume the grid cells
are square and we have a fixed-size length for each of the cells in this grid; let us call
that size τ . For example, if your image size is 1021 ×975 and τ = 8, then you will have
a grid size of (m = 127) × (n = 121). You can ignore the boundary of the image that
can not be fit into a grid (on either end), i. e., just consider the crop of the image that
fits to the grid perfectly, which is 1016 × 968 in this example.3. For each cell, form an orientation histogram by quantizing the gradient directions and,
for each such orientation bin, add the (thresholded) gradient magnitudes. This processcan be done in two steps: Imagine gradient orientations are discretized by 6 bins:
[−15◦
, 15◦
), [15◦
, 45◦
), [45◦
, 75◦
), [75◦
, 105◦
), [105◦
, 135◦
), [135◦
, 165◦
)Remember 165◦
is equivalent to −15◦ where the orientation is not directed. Now create a 3D array (m × n × 6) where in element (i, j, k) of this 3D array you will store
the accumulated gradient magnitudes over all the pixels in the cell (i, j) with gradient
orientations corresponding to bin k.Another approach for constructing the HOG, is to collect the number of occurrences
in each bin, rather than accumulating the magnitudes of occurrences; i.e. in element
(i, j, k) of the histogram, we store the number of pixels in cell (i, j) with gradient orientations corresponding to bin k
Choose reasonable values for the threshold and cell size, and then visualize the resulting
3D arrays (using both approaches) on the given images similar to the quiver plot of Figure 1. Briefly, compare the two approaches by inspecting the visualizations.(15 marks)Hint: You can use any package/function for creating the visualization in Figure 1.
One way to do that is to superimpose 6 quiver plots (one for each bin), generated by
quiver function in matplotlib package.
For the remaining tasks, you can use either approaches for constructing HOG. Make
sure to explicitly mention your choice in the report.4. To account for changes in illumination and contrast, the gradient strengths must be
locally normalized, which requires grouping the cells together into larger, spatially
connected blocks (adjacent cells). Given the histogram of oriented gradients, you apply
L2 normalization as follows:• Build a descriptor for the first block by concatenating the HOG within the block.
You can use block size = 2, i.e., 2 × 2 block will contain 2 × 2 × 6 entries that will
be concatenated to form one long vector.
• Normalize the descriptor as follows:

i = p
hi
P
i
h
2
i + e
2
where hi
is the i
th element of the vector and hˆ
i
is the normalized histogram. e is
the normalization constant to prevent division by zero (e.g., e = 0.001).• Assign the normalized histogram to the first cell of a new histogram array, i.e. cell
(1,1).
• Move to the next block of old histogram array with stride 1 and iterate steps 1-3
above, to compute the next cell of the new histogram array.The resulting new histogram array will have the size of (m − 1) × (n − 1) × 24. Compute
normalized histogram arrays for the provided images, and store them in a single line text file
where the data is stored row by row (first row then second row etc.). Submit both your code
and the files that are generated by your code. Please note that the file should have the same
name as the image (e.g. ‘image.jpg’ → ‘image.txt’). (15 marks)In addition to the provided images, use your own camera (e.g. smartphone camera) to
capture two images of the same scene, one with flash and one without flash. Convert the
images to gray-scale, and down-sample the images if needed to avoid excessive computation
overhead.First, compute the original HOG arrays for these two images and visualize them similar
to Figure 1. (5 marks)
Second, compute the normalized histogram arrays for each of these two images, and store
them in two txt files as instructed earlier. (5 marks)Third, by comparing the results, argue why or why not the normalization of HOG may
be beneficial. Limit your discussion to a paragraph, containing the main points. You can
compare the histograms visually or you may want to define a quantifiable measure to compare
the histograms for pair of with-flash/no-flash images. If you choose to visually compare,
provide the details of your visualization approach for normalized HOG; alternatively, if you
decide to quantitatively compare the histograms, include the function you used and your
justification in the report. (20 marks)Download two images (I1 and I2) of the Sandford Fleming Building taken under two
different viewing directions:
• https://commons.wikimedia.org/wiki/File:University College, University of Toronto.jpg
• https://commons.wikimedia.org/wiki/File:University College Lawn, University of Toronto, Canada.jpg
1. Calculate the eigenvalues of the Second Moment Matrix (M) for each pixel of I1 and
I2.2. Show the scatter plot of λ1 and λ2 for all the pixels in I1 (5 marks) and the same
scatter plot for I2 (5 marks). Each point shown at location (x, y) in the scatter plot,
corresponds to a pixel with eigenvalues: λ1 = x and λ2 = y.3. Based on the scatter plots, pick a threshold for min(λ1, λ2) to detect corners. Illustrate
detected corners on each image using the chosen threshold (10 marks).
4. Constructing matrix M involves the choice of a window function w(x, y). Often a
Gaussian kernel is used. Repeat steps 1, 2, and 3 above, using a significantly different Gaussian kernel (i.e. a different σ) than the one used before. For example,
choose a σ that is significantly (e.g. 5 times, or 10 times) larger than the previous one
(10 marks). Explain how this choice influenced the corner detection in each of the
images (10 marks).We have two images of a planar object (e.g. a painting) taken from different viewpoints and
we want to align them. We have used SIFT to find a large number of point correspondences
between the two images and visually estimate that at least 70% of these matches are correct
with only small potential inaccuracies. We want to find the true transformation between the
two images with a probability greater than 99.5%.1. (5 marks) Calculate the number of iterations needed for fitting a homography.
2. (5 marks) Without calculating, briefly explain whether you think fitting an affine
transformation would require fewer or more RANSAC iterations and why.Assume a plane passing through point P⃗
0 = [X0, Y0, Z0]
T with normal ⃗n. The corresponding
vanishing points for all the lines lying on this plane form a line called the horizon. In this
question, you are asked to prove the existence of the horizon line by following the steps below:1. (15 marks) Find the pixel coordinates of the vanishing point corresponding to a line
L, passing point P⃗
0 and going along direction ⃗d.
Hint: P⃗ = P⃗
0 +t
⃗d are the points on line L, and ⃗p =


ωx
ωy
ω

 = K P⃗ = K


X0 + t dx
Y0 + t dy
Z0 + t dz


are pixel coordinates of the same line in the image, and K =


f 0 px
0 f py
0 0 1

, where f is
the camera focal length and (px, py) is the principal point.2. (15 marks) Prove the vanishing points of all the lines lying on the plane form a line.
Hint: all the lines on the plane are perpendicular to the plane’s normal ⃗n; that is,
⃗n . ⃗d = 0, or nx dx + ny dy + nz dz = 0Using the homogeneous coordinates:
1. (15 marks) (a) Show that the intersection of the 2D line l and l

is the 2D point
p = l × l

.
(here × denotes the cross product)2. (15 marks) (b) Show that the line that goes through the 2D points p and p

is l = p×p

.You are given three images hallway1.jpg, hallway2.jpg, hallway3.jpg which were shot
with the same camera (i.e. same internal camera parameters), but held at slightly different
positions/orientations (i.e. with different external parameters).
hallway1.jpg hallway2.jpg hallway3.jpg
Consider the homographies H, 

wexe
weye
we

 =


x
y
1


that map corresponding points of one image I to a second image Ie, for three cases:
A. The right wall of I =hallway1.jpg to the right wall of Ie=hallway2.jpg.
B. The right wall of I =hallway1.jpg to the right wall of Ie=hallway3.jpg.
C. The floor of Ie=hallway1.jpg to the floor of Ie=hallway3.jpg.For each of these three cases:
1. (10 marks) Use a Data Cursor to select corresponding points by hand. Select more
than four pairs of points. (Four pairs will give a good fit for those points, but may give
a poor fit for other points.) Also, avoid choosing three (or more) collinear points, since
these do not provide independent information. This is trickier for case C. Make two
figures showing the gray-level images of I and Ie with a colored square marking each
of the selected points. You can convert the image I or Ie to gray level using an RGB to
grayscale function (or the formula gray = 0.2989 × R + 0.5870 × G + 0.1140 × B).2. (10 marks) Fit a homography H to the selected points. Include the estimated H in
the report, and describe its effect using words such as scale, shear, rotate, translate,
if appropriate. You are not allowed to use any homography estimation function in
OpenCV or other similar packages.3. (10 marks) Make a figure showing the Ie image with red squares that mark each of
the selected (x, e ye), and green squares that mark the locations of the estimated (x, e ye),
that is, use the homography to map the selected (x, y) to the (x, e ye) space.4. (25 marks) Make a figure showing a new image that is larger than the original one(s).
The new image should be large enough that it contains the pixels of the I image as a
subset, along with all the inverse mapped pixels of the Ie image. The new image should
be constructed as follows:• RGB values are initialized to zero,
• The red channel of the new image must contain the rgb2gray values of the I
image (for the appropriate pixel subset only );• The blue and green channels of the new image must contain the rgb2gray values
of the corresponding pixels (x, e ye) of Ie. The correspondence is computed as follows:
for each pixel (x, y) in the new image, use the homography H to map this pixel to
the (x, e ye) domain (not forgetting to divide by the homogeneous coordinate), and
round the value so you get an integer grid location. If this (x, e ye) location indeed
lies within the domain of the Ie image, then copy the rgb2gray’ed value from that
Ie(x, e ye) into the blue and green channel of pixel (x, y) in the new image. (This
amounts to an inverse mapping.)If the homography is correct and if the surface were Lambertian∗
then corresponding points in the new image would have the same values of R,G, and B and so the
new image would appear to be gray at these pixels.• Based on your results, what can you conclude about the relative 3D positions and
orientations of the camera? Give only qualitative answers here. Also, What can
you conclude about the surface reflectance of the right wall and floor, namely are
they more or less Lambertian? Limit your discussion to a few sentences.
(5 marks) Along with your writeup, hand in the program that you used to solve the problem. You should have a switch statement that chooses between cases A, B, C.∗ Lambertian reflectance is the property that defines an ideal “matte” or diffusely reflecting
surface. The apparent brightness of a Lambertian surface to an observer is the same regardless
of the observer’s angle of view. Unfinished wood exhibits roughly Lambertian reflectance, but
wood finished with a glossy coat of polyurethane does not, since the glossy coating creates
specular highlights. Specular reflection, or regular reflection, is the mirror-like reflection of
waves, such as light, from a surface. Reflections on still water are an example of specular
reflection.In tutorial 10, we learned about the mean shift and cam shift tracking. In this question,
we first attempt to evaluate the performance of mean shift tracking in a single case and will
then implement a small variation of the standard mean shift tracking. For both parts you
can use the attached short video KylianMbappe.mp4 or, alternatively, you can record and
use a short (2-3 second) video of yourself. You can use any OpenCV (or other) functions you
want in this question.1. (20 marks) Performance Evaluation
• Use the Viola-Jones face detector to detect the face on the first frame of the video.
The default detector can detect the face in the first frame of the attached video. If
you record a video of yourself, make sure your face is visible and facing the camera
in the first frame (and throughout the video) so the detector can detect your face
in the first frame.• Construct the hue histogram of the detected face on the first frame using appropriate saturation and value thresholds for masking. Use the constructed hue
histogram and mean shift tracking to track the bounding box of the face over the
length of the video (from frame #2 until the last frame). So far, this is similar to
what we did in the tutorial.• Also, use the Viola-Jones face detector to detect the bounding box of the face in
each video frame (from frame #2 until the last frame).
• Calculate the intersection over union (IoU) between the tracked bounding box and
the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis
of the plot should be the frame number (from 2 until the last frame) and the y
axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%)
and another sample frame in which the IoU is low (e.g. below 10%). Draw the
tracked and detected bounding boxes in each frame using different colors (and
indicate which is which).• Report the percentage of frames in which the IoU is larger than 50%.
• Look at the detected and tracked boxes at frames in which the IoU is small (< 10%)
and report which (Viola-Jones detection or tracked bounding box) is correct more
often (we don’t need a number, just eyeball it). Very briefly (1-2 sentences) explain
why that might be.2. (10 marks) Implement a Simple Variation
• In the examples in Tutorial 10 (and the previous part of this question) we used
a hue histogram for mean shift tracking. Here, we implement an alternative in
which a histogram of gradient direction values is used instead.• After converting to grayscale, use blurring and the Sobel operator to first generate image gradients in the x and y directions (Ix and Iy). You can then use
cartToPolar (with angleInDegrees=True) to get the gradient magnitude and
angle at each frame. You can use 24 histogram bins and [0,360] (i.e. not [0,180])
directions.• When constructing hue histograms, we thresholded saturation and value channels to create a mask. Here, you can threshold the gradient magnitude to create
a mask. For example, you can mask out pixels in the region of interest in which
the gradient magnitude is less than 10% of the maximum gradient magnitude in
the RoI.• Calculate the intersection over union (IoU) between the tracked bounding box and
the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis
of the plot should be the frame number (from 2 until the last frame) and the y
axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%)
and another sample frame in which the IoU is low (e.g. below 10%). Draw the
tracked and detected bounding boxes in each frame using different colors (and
indicate which is which).
• Report the percentage of frames in which the IoU is larger than 50%.

Shopping Cart
[SOLVED] (csc420) assignment 1 to 4 solutions[SOLVED] (csc420) assignment 1 to 4 solutions
$25