[Solved] CSE676 Assignment #1-Softmax

$25

File Name: CSE676_Assignment_#1-Softmax.zip
File Size: 263.76 KB

SKU: [Solved] CSE676 Assignment #1-Softmax Category: Tag:
5/5 - (1 vote)

Softmax

  • Prove that softmax is invariant to constant sifts in the input, e., for any input vector x and a constant scalar c, the following holds:

softmax(x) = softmax(x+c) ,

where softmax(x), and x+c means adding c to every dimension of x.

  • Let z = Wx+c, where W and c are some matrix and vector, respectively. Let

J = Xlogsoftmax(z)i .

i

Calculate the derivatives of J w.r.t. W and c, respectively, i.e., calculate JW and .

0.2 Logistic Regression with Regularization

  • [10 point] Let the data be (, where xi Rd and yi {0,1}. Logistic regression is a binary classification model, with the probability of yi being 1 as:

where is the model parameter. Assume we impose an L2 regularization term on the parameter, defined as:

with a positive constant . Write out the final objective function for this logistic regression with regularization model.

  • [10 point] If we use gradient descent to solve the model parameter. Derive the updating rule for . Your answer should contain the derivation, not just the final answer.

0.3 Derivative of the Softmax Function

1) [10 point] Define the loss function as

K

J(z) = Xyk log yk ,

k=1

where , and (y1, ,yK) is a known probability vector. Derive the .

Note z = (z1, ,zK) is a vector so J(zz) is in the form of a vector. Your answer should contain the derivation, not just the final answer.

1

CSE 676 Changyou Chen Spring 2021

2 [10 point] Assume the above softmax is the output layer of an FNN. Briefly explain how the derivative is used in the backpropagation algorithm.

3) [10 points] Let z = WT h+b, where W is a matrix, b and h are vectors. Use the chain rule to calculate the gradient of W and b, i.e., JW and, respectively.

0.4 MNIST with FNN

1) [30 points] Design an FNN for MNIST classification. Implement the model and plot two curves in one figure: i) training loss vs. training iterations; ii) test loss vs. training iterations.

  • You can use online code. However, you must reference (cite) the code in your answer.
  • Submission includes the plot of the two curves and the runnable code (with a ReadMe file containing instructions on how to run the code).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE676 Assignment #1-Softmax
$25