The Wisconsin State Climatology Office keeps a record on the number of days Lake Mendota was covered by ice at http://www.aos.wisc.edu/sco/lakes/Mendota-ice.html. Same for Lake Monona: http://www.aos.wisc.edu/ sco/lakes/Monona-ice.html. As with any real problems, the data is not as clean or as organized as one would like for machine learning. Curate two clean data sets for each lake, respectively, starting from 1855-56 and ending in 2018-19. Let x be the year: for 1855-56, x = 1855; for 2017-18, x = 2017; and so on. Let y be the ice days in that year: for Mendota and 1855-56, y = 118; for 2017-18, y = 94; and so on. Some years have multiple freeze thaw cycles such as 2001-02, that one should be x = 2001,y = 21.
- Plot year vs. ice days for the two lakes as two curves in the same plot. Produce another plot for year vs. yMonona yMendota.
- Split the datasets: x 1970 as training, and x > 1970 as test. (Comment: due to the temporal nature this is NOT an iid split. But we will work with it.) On the training set, compute the sample mean and the sample standard deviation for the two lakes, respectively.
- Using training sets, train a linear regression model
yMendota = 0 + 1x + 2yMonona
to predict yMendota. Note: we are treating yMonona as an observed feature. Do this by finding the closedform MLE solution for = (0,1,2)> (no regularization):
.
Give the MLE formula in matrix form (define your matrices), then give the MLE value of 0,1,2.
- Using the MLE above, give the (1) mean squared error and (2) R2 values on the Mendota test set. (You will need to use the Monona test data as observed features.)
- Reset to Q3, but this time use gradient descent to learn the s. Recall our objective function is the mean squared error on the training set:
.
Derive the gradient.
- Implement gradient descent. Initialize 0 = 1 = 2 = 0. Use a fixed stepsize parameter = 0.1 and print the first 10 iterations objective function value. Tell us if further iterations make your gradient descent converge, and if yes when; compare the s to the closed-form solution. Try other values and tell us what happens. Hint: Update 0,1,2 simultaneously in an iteration. Dont use a new 0 to calculate 1, and so on.
Homework 5 CS 760 Machine Learning
- As preprocessing, normalize your year and Monona features (but not yMendota). Then repeat Q6.
- Reset to Q3 (no normalization, use closed-form solution), but train a linear regression model without using Monona:
yMendota = 0 + 1x.
- Interpret the sign of 1.
- Some analysts claim that because 1 the closed-form solution in Q3 is positive, fixing all other factors, as the years go by the number of Mendota ice days will increase, namely the model in Q3 indicates a cooling trend. Discuss this viewpoint, relate it to question 8(a).
- Of course, Weka has linear regression. Reset to Q3. Save the training data in .arff format for Weka. Use classifiers / functions / LinearRegression. Choose Use training set. Bring up Linear Regression options, set ridge to 0 so it does not regularize. Run it and tell us the model: it is in the output in the form of 1 * year + 2 * Monona + 0.
- Ridge regression.
- Then set ridge to 1 and tell us the resulting Weka model.
- Meanwhile, derive the closed-form solution in matrix form for the ridge regression problem:
where
kk2A := >A
and
0 0 0
A = 0 1 0.
0 0 1
This A matrix has the effect of NOT regularizing the bias 0, which is standard practice in ridge regression. Note: Derive the closed-form solution, do not blindly copy lecture notes.
- Let = 1 and tell us the value of from your ridge regression model.
Extra Credit: Multinomial Nave Bayes
Consider the Multinomial Nave Bayes model. For each point (x,y), y {0,1}, x = (x1,x2,,xM) where each xj is an integer from {1,2,,K} for 1 j M. Here K and M are two fixed integer. Suppose we have N data points {(x(i),y(i)) : 1 i N}, generated as follows.
for i {1,,N}: y(i) Bernoulli() for j {1,,M}:
Multinomial(y(i),1)
Here R and k RK(k {0,1} are parameters. Note that Pl k,l = 1 since they are the parameters of a multinomial distribution.
Derive the formula for estimating the parameters and k, as we have done in the lecture for the Bernoulli Nave Bayes model. Show the steps.
Extra Credit: Logistic Regression
- Suppose for each class i {1,,K}, the class-conditional density p(x|y = i) is normal with mean i Rd and the same covariance Rdd:
p(x|y = i) = N(x|i,).
Compute p(y = i|x). Can it be represented as a softmax over a linear transformation of x? Show the calculation steps.
Homework 5 CS 760 Machine Learning
- Suppose p(x|y = i) has different covariances i Rdd:
p(x|y = i) = N(x|i,i).
Again, compute p(y = i|x). Can it be represented as a softmax over a linear transformation of x? Show the calculation steps.
Reviews
There are no reviews yet.