, , ,

[SOLVED] Cs540 p8: ice cover regression

$25

File Name: Cs540_p8__ice_cover_regression.zip
File Size: 282.6 KB

5/5 - (1 vote)
5/5 – (1 vote)

Assignment Goals

  • Implement a linear regression calculation
  • Examine the trends in real (messy) data

Summary

Ed Hopkins over at the Wisconsin State Climatology OfficeLinks to an external site. maintains a listing of the dates and durations of full-freeze ice covers on our dear Madison lakes, Mendota and Monona, found hereLinks to an external site.. This data goes back to the mid-1800s, thanks to handwritten records in some very old log books (they’re behind glass in Ed’s office over in AOS).

Analyzing this data is an interesting exploration in the changes in our local climate over time, so let’s use some machine learning techniques to take a look at the data.

Program Specification

As with most real problems, the data is not as clean or as organized as one would like for machine learning. You’ll be retrieving the data for Lake Mendota from http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.htmlLinks to an external site. and cleaning it for your program.

Write the following Python functions in a file called ice_cover.py (yes there are a lot, most of them are pretty short and have sample output):

  1. get_dataset() — takes no arguments and returns the data as described below in an n-by-2 array
  2. print_stats(dataset) — takes the dataset as produced by the previous function and prints several statistics about the data; does not return anything
  3. regression(beta_0, beta_1) — calculates and returns the mean squared error on the dataset given fixed betas
  4. gradient_descent(beta_0, beta_1) — performs a single step of gradient descent on the MSE and returns the derivative values as a tuple
  5. iterate_gradient(T, eta) — performs T iterations of gradient descent starting at LaTeX: (beta_0, beta_1) = (0,0) with the given parameter and prints the results; does not return anything
  6. compute_betas() — using the closed-form solution, calculates and returns the values of LaTeX: beta_0 and LaTeX: beta_1 and the corresponding MSE as a three-element tuple
  7. predict(year) — using the closed-form solution betas, return the predicted number of ice days for that year
  8. iterate_normalized(T, eta) — normalizes the data before performing gradient descent, prints results as in function 5
  9. sgd(T, eta) — performs stochastic gradient descent, prints results as in function 5

Get Dataset

The get_dataset() function should return an n-by-2 array of data, where n is the number of winters between 1855 and 2019.

Curate a clean data set starting from 1855-56 and ending in 2019-20. Let x be the beginning year: for 1855-56, x= 1855; for 2019-20, x= 2019; and so on. Let y be the total number of ice days in that year: for 1855-56, y= 118; for 2019-20, y= 70; and so on.

Note that some years have multiple freeze/thaw cycles, such as 2018-19. That year should be recorded as x= 2018, y= 86. Although we do not ask you to hand in any visualization code or figures, we strongly advise you to plot the data (using any plotting tool inside or outside Python) and see what it is like. Or you can mess around with PBS Wisconsin’s Ice Cover visualization toolLinks to an external site., which Hobbes thinks is pretty neat but that might be because she wrote it.

For simplicity, hard code the data set in your program. For the rest of this assignment we will refer to the year value (e.g. 1855, 1926, 2019) as x and the ice days value (e.g. 118, 103, 70) as y.

>>> get_dataset()
=> [[1855, 118],
     ...
    [1926, 103],
     ...
    [2019, 70]]

Dataset Statistics

This is just a quick summary function for the above dataset. When called, you should print:

  1. the number of data points
  2. the sample mean LaTeX: bar{y} = frac{1}{n}sum_{i=1}^n y_i
  3. the sample standard deviation LaTeX: sqrt{frac{1}{n-1}sum_{i=1}^n(y_i-bar{y})^2}

on three lines. Please format your output to include only TWO digits after the decimal point. For example (numbers are made up and do not correspond to actual output, you will need to calculate these results yourself):

>>> data = get_dataset()
>>> print_stats(data)
165
123.45
32.10

Linear Regression

This function will perform linear regression with the model

LaTeX: f(x)= beta_0 + beta_1 x

We first define the mean squared error (MSE) as a function of the betas:

LaTeX: text{MSE}(beta_0, beta_1) = frac{1}{n}sum_{i=1}^n(beta_0+beta_1 x_i - y_i)^2

The two arguments for this function represent these two betas. Return the corresponding MSE as calculated on your dataset (which you should retrieve within your function by calling your get_dataset() function).

>>> regression(0,0)
=> 10827.78
>>> regression(100,0)
=> 386.57
>>> regression(300,-.1)
=> 332.83
>>> regression(400,.1)
=> 242059.01
>>> regression(200,-.2)
=> 84167.47

Note that I’m continuing to round these values to two decimal places for clarity; your returned values should be the full floats.

Gradient Descent

This function will perform gradient descent on the MSE. At the current parameter LaTeX: (beta_0,beta_1), the gradient is defined by the vector of partial derivatives:

LaTeX: frac{partialtext{MSE}(beta_0,beta_1)}{partialbeta_0} = frac{2}{n}sum_{i=1}^n(beta_0+beta_1 x_i - y_i) \
frac{partialtext{MSE}(beta_0,beta_1)}{partialbeta_1} =  frac{2}{n}sum_{i=1}^n(beta_0+beta_1 x_i - y_i)x_i

This function returns the corresponding gradient as a tuple with the partial derivative with respect to LaTeX: beta_0 as the first value.

>>> gradient_descent(0,0)
=> (-204.41, -395063.04)
>>> gradient_descent(100,0)
=> (-4.41, -7663.04)
>>> gradient_descent(300,-.1)
=> (8.19, 16289.42)
>>> gradient_descent(400,.1)
=> (982.99, 1905384.49)
>>> gradient_descent(200,-.2)
=> (-579.21, -1121958.11)

Iterate Gradient

Gradient descent starts from initial parameter LaTeX: left(beta_0^{left(0right)},beta_1^{left(0right)}right) and iterates the following updates at time t = 1, 2, … , T:

LaTeX: beta_0^{left(tright)}:=beta_0^{left(t-1right)}-etafrac{partialtext{MSE}left(beta_0^{left(t-1right)},beta_1^{left(t-1right)}right)}{partialbeta_0}

LaTeX: beta_1^{left(tright)}=beta_1^{left(t-1right)}-etafrac{partialtext{MSE}left(beta_0^{left(t-1right)},beta_1^{left(t-1right)}right)}{partialbeta_1}

The parameters to this function are the T number of iterations to perform, and LaTeX: eta (eta), the parameter for the above calculations. Always begin from initial parameter LaTeX: left(beta^{left(0right)}_0,beta_1^{left(0right)}right):=:left(0,0right).

Print the following for each iteration on one line, separated by spaces:

  1. the current iteration number beginning at 1 and ending at T
  2. the current value of beta_0
  3. the current value of beta_1
  4. the current MSE

As before, all floating point values should be rounded to two digits for output.

>>> iterate_gradient(5, 1e-7)
1 0.00 0.04 1079.72
2 0.00 0.05 474.59
3 0.00 0.05 437.03
4 0.00 0.05 434.69
5 0.00 0.05 434.55
>>> iterate_gradient(5, 1e-8)
1 0.00 0.00 9325.63
2 0.00 0.01 8040.58
3 0.00 0.01 6941.27
4 0.00 0.01 6000.84
5 0.00 0.02 5196.33
>>> iterate_gradient(5, 1e-9)
1 0.00 0.00 10672.29
2 0.00 0.00 10519.13
3 0.00 0.00 10368.26
4 0.00 0.00 10219.64
5 0.00 0.00 10073.25
>>> iterate_gradient(5, 1e-6)
1 0.00 0.40 440695.17
2 -0.00 -2.18 18649996.73
3 0.01 14.56 790001058.39
4 -0.05 -94.36 33464645835.46
5 0.32 614.54 1417571655440.93

Note that with eta = 1e-6 gradient descent is diverging! Try different values for eta and a much larger T, and see how small you can make MSE (optional).

Compute Betas

Instead of using gradient descent, we can compute the closed-form solution for the parameters directly. For ordinary least-squared in 1D, this is

LaTeX: hat{beta}_1 = frac{sum(x_i - bar{x})(y_i-bar{y})}{sum(x_i-bar{x})^2} \
hat{beta}_0 = bar{y} - hat{beta}_1bar{x}

where LaTeX: bar{x} = (sum x_i)/n and LaTeX: bar{y} = (sum y_i)/n.

This function returns the calculated betas and their corresponding MSE in a tuple, as (beta_0, beta_1, MSE).

And no, I’m not going to tell you what the answers are. Do the math.

Predict Ice Cover

Using your closed-form betas, predict the number of ice days for a future year. Return that value.

For example:

>>> predict(2021)
=> 85.85

Food for thought (you don’t need to write this up or turn it in, but if you want to discuss it on Piazza please do):

  • What’s the prediction for next year? five years from now?
  • Which year will the predicted number of ice days become negative?
  • What does this say about our model?

Normalized Gradient Descent

Can’t get your iterating gradient descent to match the closed-form solution? You’re not alone! The culprit is the scale of input x, compared to the implicit offset value 1 (think LaTeX: beta_0:=:beta_0cdot1). Gradient descent converges slowly when these scales differ greatly, a situation known as bad condition number in optimization.

For this function, first normalize your x values (NOT the y values):

LaTeX: bar{x} = frac{1}{n}sum_{i=1}^n x_i \
std_x = sqrt{frac{1}{n-1}sum_{i=1}^n(x_i-bar{x})^2}\
x_i leftarrow (x_i - bar{x})/std_x

then proceed exactly as in iterate_gradient():

>>> iterate_normalized(5, 0.1)
1 20.44 -1.85 7036.41
2 36.79 -3.33 4609.88
3 49.88 -4.52 3056.86
4 60.34 -5.47 2062.90
5 68.72 -6.23 1426.75
>>> iterate_normalized(5, 0.01)
1 2.04 -0.18 10410.73
2 4.05 -0.37 10010.20
3 6.01 -0.54 9625.53
4 7.93 -0.72 9256.08
5 9.82 -0.89 8901.27

With LaTeX: eta=0.1 you should get convergence within 100 iterations. (Note that the betas are now for the normalized version of x, but you can translate them back to the original x with a little algebra. This isn’t required for the homework.)

Stochastic Gradient Descent

Now let’s have some fun with randomness and implement Stochastic Gradient Descent (SGD). With everything the same as part 8 (including normalization), modify the gradient as follows: in iteration t, randomly pick ONE of the n items (call it LaTeX: left(x_{j_t},:y_{j_t}right)), and approximate the gradient using only that item.

LaTeX: frac{partialtext{MSE}(beta_0,beta_1)}{partialbeta_0} approx 2(beta_0 +beta_1x_{j_t} - y_{j_t})\
frac{partialtext{MSE}(beta_0,beta_1)}{partialbeta_1} approx 2(beta_0 +beta_1x_{j_t} - y_{j_t})x_{j_t}

Print the same information as in the previous function. For example (your results WILL differ because of randomness in the items selected):

>>> sgd(5, 0.1)
1 17.60 14.00 7993.47
2 36.50 24.68 5760.88
3 51.86 8.92 3160.46
4 70.23 -17.22 1380.68
5 82.74 -6.49 682.61

With LaTeX: eta=0.1 you should approximately converge within a few hundred iterations.

Since n is small in our dataset, there is little advantage of SGD over standard gradient descent. However, on large datasets, SGD becomes much more desirable…

Submission

Please submit your code in a file called ice_cover.py. All code should be contained in functions or under a

if __name__=="__main__":

check so that it will not run if your code is imported to another program.

Shopping Cart
[SOLVED] Cs540 p8: ice cover regression[SOLVED] Cs540 p8: ice cover regression
$25