Question 1 (4-4-4). Using the following definition of the derivative and the definition of the Heaviside step function :
1 if x > 0
if x = 0 0 if x < 0
- Show that the derivative of the rectified linear unit g(x) = max{0,x}, wherever it exists, is equal to the Heaviside step function.
- Give two alternative definitions of g(x) using H(x).
- Show that H(x) can be well approximated by the sigmoid function asymptotically (i.e for large k), where k is a parameter.
Question 2 (3-3-3-3). Recall the definition of the softmax function : S(x)i = exi/Pj exj.
- Show that softmax is translation-invariant, that is : S(x+c) = S(x), where c is a scalar constant.
- Show that softmax is not invariant under scalar multiplication. Let Sc(x) = S(cx) where c 0. What are the effects of taking c to be 0 and arbitrarily large?
- Let x be a 2-dimensional vector. One can represent a 2-class categorical probability using softmax S(x). Show that S(x) can be reparameterized using sigmoid function, i.e. S(x) = [(z),1(z)]> where z is a scalar function of x.
- Let x be a K-dimensional vector (K 2). Show that S(x) can be represented using K 1 parameters, i.e. S(x) = S([0,y1,y2,,yK1]>) where yi is a scalar function of x for i {1,,K 1}.
Question 3 (16). Consider a 2-layer neural network y : RD RK of the form :
for 1 k K, with parameters = ((1),(2)) and logistic sigmoid activation function . Show that there exists an equivalent network of the same form, with parameters 0 = ((1),(2)) and tanh activation function, such that for all x RD, and express 0 as a function of .
Question 4 (5-5). Fundamentally, back-propagation is just a special case of reverse-mode Automatic Differentiation (AD), applied to a neural network. Based on the three-part notation shown in Table 1 and 4, represent the evaluation trace and derivative (adjoint) trace of the following examples. In the last columns of your solution, numerically evaluate the value up to 4 decimal places.
- Forward AD, with y = f(x1,x2) = 1/(x1 + x2) + x22 + cos(x1) at (x1,x2) = (3,6) and setting x1 = 1 to compute y/x1.
- Reverse AD, with y = f(x1,x2) = 1/(x1 + x2) + x22 + cos(x1) at (x1,x2) = (3,6). Setting y = 1, y/x1 and y/x2 can be computed together.
Question 5 (6). Compute the full, valid, and same convolution (with kernel flipping) for the following 1D matrices :
Question 6 (5-5). Consider a convolutional neural network. Assume the input is a colorful image of size 256 256 in the RGB representation. The first layer convolves 64 8 8 kernels with the input, using a stride of 2 and no padding. The second layer downsamples the output of the first layer with a 5 5 non-overlapping max pooling. The third layer convolves 128 4 4 kernels with a stride of 1 and a zero-padding of size 1 on each border.
- What is the dimensionality (scalar) of the output of the last layer?
- Not including the biases, how many parameters are needed for the last layer?
Question 7 (4-4-6). Assume we are given data of size 3 64 64. In what follows, provide a correct configuration of a convolutional neural network layer that satisfies the specified assumption. Answer with the window size of kernel (k), stride (s), padding (p), and dilation (d, with convention d = 1 for no dilation). Use square windows only (e.g. same k for both width and height).
- The output shape (o) of the first layer is (64,32,32).
- Assume k = 8 without dilation.
- Assume d = 7, and s = 2.
- The output shape of the second layer is (64,8,8). Assume p = 0 and d = 1.
- Specify k and s for pooling with non-overlapping window.
- What is output shape if k = 8 and s = 4 instead?
- The output shape of the last layer is (128,4,4).
- Assume we are not using padding or dilation.
- Assume d = 2, p = 2. (c) Assume p = 1, d = 1.

![[Solved] ift6135 Assigment 1](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] ift6135 Assigment 0](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.