Softmax
- Prove that softmax is invariant to constant sifts in the input, e., for any input vector x and a constant scalar c, the following holds:
softmax(x) = softmax(x+c) ,
where softmax(x), and x+c means adding c to every dimension of x.
- Let z = Wx+c, where W and c are some matrix and vector, respectively. Let
J = Xlogsoftmax(z)i .
i
Calculate the derivatives of J w.r.t. W and c, respectively, i.e., calculate JW and .
0.2 Logistic Regression with Regularization
- [10 point] Let the data be (, where xi Rd and yi {0,1}. Logistic regression is a binary classification model, with the probability of yi being 1 as:
where is the model parameter. Assume we impose an L2 regularization term on the parameter, defined as:
with a positive constant . Write out the final objective function for this logistic regression with regularization model.
- [10 point] If we use gradient descent to solve the model parameter. Derive the updating rule for . Your answer should contain the derivation, not just the final answer.
0.3 Derivative of the Softmax Function
1) [10 point] Define the loss function as
K
J(z) = Xyk log yk ,
k=1
where , and (y1, ,yK) is a known probability vector. Derive the .
Note z = (z1, ,zK) is a vector so J(zz) is in the form of a vector. Your answer should contain the derivation, not just the final answer.
1
CSE 676 Changyou Chen Spring 2021
2 [10 point] Assume the above softmax is the output layer of an FNN. Briefly explain how the derivative is used in the backpropagation algorithm.
3) [10 points] Let z = WT h+b, where W is a matrix, b and h are vectors. Use the chain rule to calculate the gradient of W and b, i.e., JW and, respectively.
0.4 MNIST with FNN
1) [30 points] Design an FNN for MNIST classification. Implement the model and plot two curves in one figure: i) training loss vs. training iterations; ii) test loss vs. training iterations.
- You can use online code. However, you must reference (cite) the code in your answer.
- Submission includes the plot of the two curves and the runnable code (with a ReadMe file containing instructions on how to run the code).
Reviews
There are no reviews yet.