5/5 - (1 vote)

Question 1 (4-4-4). Using the following definition of the derivative and the definition of the Heaviside step function :

1 if x > 0

if x = 0 0 if x < 0

Show that the derivative of the rectified linear unit g(x) = max{0,x}, wherever it exists, is equal to the Heaviside step function.
Give two alternative definitions of g(x) using H(x).

Show that H(x) can be well approximated by the sigmoid function asymptotically (i.e for large k), where k is a parameter.

Question 2 (3-3-3-3). Recall the definition of the softmax function : S(x)_i= e^xⁱ/^P_je^x^j.

Show that softmax is translation-invariant, that is : S(x+c) = S(x), where c is a scalar constant.
Show that softmax is not invariant under scalar multiplication. Let S_c(x) = S(cx) where c 0. What are the effects of taking c to be 0 and arbitrarily large?
Let x be a 2-dimensional vector. One can represent a 2-class categorical probability using softmax S(x). Show that S(x) can be reparameterized using sigmoid function, i.e. S(x) = [(z),1(z)]^>where z is a scalar function of x.
Let x be a K-dimensional vector (K 2). Show that S(x) can be represented using K 1 parameters, i.e. S(x) = S([0,y₁,y₂,,y_K₁]^>) where y_iis a scalar function of x for i {1,,K 1}.

Question 3 (16). Consider a 2-layer neural network y : R^D R^Kof the form :

for 1 k K, with parameters = (⁽¹⁾,⁽²⁾) and logistic sigmoid activation function . Show that there exists an equivalent network of the same form, with parameters ⁰= (⁽¹⁾,⁽²⁾) and tanh activation function, such that for all x R^D, and express ⁰as a function of .

Question 4 (5-5). Fundamentally, back-propagation is just a special case of reverse-mode Automatic Differentiation (AD), applied to a neural network. Based on the three-part notation shown in Table 1 and 4, represent the evaluation trace and derivative (adjoint) trace of the following examples. In the last columns of your solution, numerically evaluate the value up to 4 decimal places.

Forward AD, with y = f(x₁,x₂) = 1/(x₁+ x₂) + x₂²+ cos(x₁) at (x₁,x₂) = (3,6) and setting x₁= 1 to compute y/x₁.
Reverse AD, with y = f(x₁,x₂) = 1/(x₁+ x₂) + x₂²+ cos(x₁) at (x₁,x₂) = (3,6). Setting y = 1, y/x₁and y/x₂can be computed together.

Question 5 (6). Compute the full, valid, and same convolution (with kernel flipping) for the following 1D matrices :

Question 6 (5-5). Consider a convolutional neural network. Assume the input is a colorful image of size 256 256 in the RGB representation. The first layer convolves 64 8 8 kernels with the input, using a stride of 2 and no padding. The second layer downsamples the output of the first layer with a 5 5 non-overlapping max pooling. The third layer convolves 128 4 4 kernels with a stride of 1 and a zero-padding of size 1 on each border.

What is the dimensionality (scalar) of the output of the last layer?
Not including the biases, how many parameters are needed for the last layer?

Question 7 (4-4-6). Assume we are given data of size 3 64 64. In what follows, provide a correct configuration of a convolutional neural network layer that satisfies the specified assumption. Answer with the window size of kernel (k), stride (s), padding (p), and dilation (d, with convention d = 1 for no dilation). Use square windows only (e.g. same k for both width and height).

The output shape (o) of the first layer is (64,32,32).
- Assume k = 8 without dilation.

Assume d = 7, and s = 2.

The output shape of the second layer is (64,8,8). Assume p = 0 and d = 1.
- Specify k and s for pooling with non-overlapping window.
- What is output shape if k = 8 and s = 4 instead?
The output shape of the last layer is (128,4,4).
- Assume we are not using padding or dilation.
- Assume d = 2, p = 2. (c) Assume p = 1, d = 1.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] ift6135 Assigment 1

Reviews

Whatsapp Us

[Solved] ift6135 Assigment 1

Reviews

Related products

[Solved] ift6135 Assigment 0

[Solved] IFT6135-Assignment 2

[Solved] ift6135 Assigment 2

[Solved] ift6135 Assigment 3