5/5 - (1 vote)

Submission: You must submit your solutions as a PDF file through MarkUs^[1]. You can produce the file however you like (e.g. LaTeX, Microsoft Word, scanner), as long as it is readable.

Late Submission: MarkUs will remain open until 3 days after the deadline, after which no late submissions will be accepted.

Weekly homeworks are individual work. See the Course Information handout^[2] for detailed policies.

Hard-Coding a Network. [2pts] In this problem, you need to find a set of weights and biases for a multilayer perceptron which determines if a list of length 4 is in sorted order. More specifically, you receive four inputs x₁,,x₄, where x_i R, and the network must output 1 if x₁< x₂< x₃< x₄, and 0 otherwise. You will use the following architecture:

All of the hidden units and the output unit use a hard threshold activation function:

Please give a set of weights and biases for the network which correctly implements this function (including cases where some of the inputs are equal). Your answer should include:

A 3 4 weight matrix W⁽¹⁾for the hidden layer
A 3-dimensional vector of biases b⁽¹⁾for the hidden layer
A 3-dimensional weight vector w⁽²⁾for the output layer
A scalar bias b⁽²⁾for the output layer

You do not need to show your work.

Consider a neural network with N input units, N output units, and K hidden units. The activations are computed as follows:

z = W(1)x + b(1)

h = (z) y = x + W(2)h + b(2),

CSC421/2516 Winter 2019 Homework 1

where denotes the logistic function, applied elementwise. The cost will involve both h and y:

J = R + S R = r^>h

for given vectors r and s.

[1pt] Draw the computation graph relating x, z, h, y, R, S, and J.
[3pts] Derive the backprop equations for computing x = J/x. You may use ⁰to denote the derivative of the logistic function (so you dont need to write it out explicitly).

Sparsifying Activation Function. [4pts] One of the interesting features of the ReLU activation function is that it sparsifies the activations and the derivatives, i.e. sets a large fraction of the values to zero for any given input vector. Consider the following network:

Note that each w_irefers to the weight on a single connection, not the whole layer. Suppose we are trying to minimize a loss function L which depends only on the activation of the output unit y. (For instance, L could be the squared error loss .) Suppose the unit h₁receives an input of -1 on a particular training case, so the ReLU evaluates to 0. Based only on this information, which of the weight derivatives

are guaranteed to be 0 for this training case? Write YES or NO for each. Justify your answers.

[1] https://markus.teach.cs.toronto.edu/csc421-2019-01

[2] http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSC421/2516 Homework 1

Reviews

Whatsapp Us

[Solved] CSC421/2516 Homework 1

Reviews

Related products

[Solved] CSC421/2516 Homework 3

[Solved] CSC421/2516 Programming Assignment 1: Learning Distributed Word Representations

[Solved] CSC421/2516 Programming Assignment 4: CycleGAN

[Solved] CSC421/2516 Programming Assignment 2: Convolutional Neural Networks

[Solved] CSC421/2516 Homework 2

[Solved] CSC421 Programming Assignment 3: Attention-Based Neural Machine Translation