Assignment Goals
- Implement a linear regression calculation
- Examine the trends in real (messy) data
Summary
Ed Hopkins over at the Wisconsin State Climatology OfficeLinks to an external site. maintains a listing of the dates and durations of full-freeze ice covers on our dear Madison lakes, Mendota and Monona, found hereLinks to an external site.. This data goes back to the mid-1800s, thanks to handwritten records in some very old log books (they’re behind glass in Ed’s office over in AOS).
Analyzing this data is an interesting exploration in the changes in our local climate over time, so let’s use some machine learning techniques to take a look at the data.
Program Specification
As with most real problems, the data is not as clean or as organized as one would like for machine learning. You’ll be retrieving the data for Lake Mendota from http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.htmlLinks to an external site. and cleaning it for your program.
Write the following Python functions in a file called ice_cover.py
(yes there are a lot, most of them are pretty short and have sample output):
- get_dataset() — takes no arguments and returns the data as described below in an n-by-2 array
- print_stats(dataset) — takes the dataset as produced by the previous function and prints several statistics about the data; does not return anything
- regression(beta_0, beta_1) — calculates and returns the mean squared error on the dataset given fixed betas
- gradient_descent(beta_0, beta_1) — performs a single step of gradient descent on the MSE and returns the derivative values as a tuple
- iterate_gradient(T, eta) — performs T iterations of gradient descent starting at
with the given parameter and prints the results; does not return anything
- compute_betas() — using the closed-form solution, calculates and returns the values of
and
and the corresponding MSE as a three-element tuple
- predict(year) — using the closed-form solution betas, return the predicted number of ice days for that year
- iterate_normalized(T, eta) — normalizes the data before performing gradient descent, prints results as in function 5
- sgd(T, eta) — performs stochastic gradient descent, prints results as in function 5
Get Dataset
The get_dataset()
function should return an n-by-2 array of data, where n is the number of winters between 1855 and 2019.
Curate a clean data set starting from 1855-56 and ending in 2019-20. Let x be the beginning year: for 1855-56, x= 1855; for 2019-20, x= 2019; and so on. Let y be the total number of ice days in that year: for 1855-56, y= 118; for 2019-20, y= 70; and so on.
Note that some years have multiple freeze/thaw cycles, such as 2018-19. That year should be recorded as x= 2018, y= 86. Although we do not ask you to hand in any visualization code or figures, we strongly advise you to plot the data (using any plotting tool inside or outside Python) and see what it is like. Or you can mess around with PBS Wisconsin’s Ice Cover visualization toolLinks to an external site., which Hobbes thinks is pretty neat but that might be because she wrote it.
For simplicity, hard code the data set in your program. For the rest of this assignment we will refer to the year value (e.g. 1855, 1926, 2019) as x and the ice days value (e.g. 118, 103, 70) as y.
>>> get_dataset() => [[1855, 118], ... [1926, 103], ... [2019, 70]]
Dataset Statistics
This is just a quick summary function for the above dataset. When called, you should print:
- the number of data points
- the sample mean
- the sample standard deviation
on three lines. Please format your output to include only TWO digits after the decimal point. For example (numbers are made up and do not correspond to actual output, you will need to calculate these results yourself):
>>> data = get_dataset() >>> print_stats(data) 165 123.45 32.10
Linear Regression
This function will perform linear regression with the model
We first define the mean squared error (MSE) as a function of the betas:
The two arguments for this function represent these two betas. Return the corresponding MSE as calculated on your dataset (which you should retrieve within your function by calling your get_dataset()
function).
>>> regression(0,0) => 10827.78 >>> regression(100,0) => 386.57 >>> regression(300,-.1) => 332.83 >>> regression(400,.1) => 242059.01 >>> regression(200,-.2) => 84167.47
Note that I’m continuing to round these values to two decimal places for clarity; your returned values should be the full floats.
Gradient Descent
This function will perform gradient descent on the MSE. At the current parameter , the gradient is defined by the vector of partial derivatives:
This function returns the corresponding gradient as a tuple with the partial derivative with respect to as the first value.
>>> gradient_descent(0,0) => (-204.41, -395063.04) >>> gradient_descent(100,0) => (-4.41, -7663.04) >>> gradient_descent(300,-.1) => (8.19, 16289.42) >>> gradient_descent(400,.1) => (982.99, 1905384.49) >>> gradient_descent(200,-.2) => (-579.21, -1121958.11)
Iterate Gradient
Gradient descent starts from initial parameter and iterates the following updates at time t = 1, 2, … , T:
The parameters to this function are the T number of iterations to perform, and (eta), the parameter for the above calculations. Always begin from initial parameter
.
Print the following for each iteration on one line, separated by spaces:
- the current iteration number beginning at 1 and ending at T
- the current value of beta_0
- the current value of beta_1
- the current MSE
As before, all floating point values should be rounded to two digits for output.
>>> iterate_gradient(5, 1e-7) 1 0.00 0.04 1079.72 2 0.00 0.05 474.59 3 0.00 0.05 437.03 4 0.00 0.05 434.69 5 0.00 0.05 434.55 >>> iterate_gradient(5, 1e-8) 1 0.00 0.00 9325.63 2 0.00 0.01 8040.58 3 0.00 0.01 6941.27 4 0.00 0.01 6000.84 5 0.00 0.02 5196.33 >>> iterate_gradient(5, 1e-9) 1 0.00 0.00 10672.29 2 0.00 0.00 10519.13 3 0.00 0.00 10368.26 4 0.00 0.00 10219.64 5 0.00 0.00 10073.25 >>> iterate_gradient(5, 1e-6) 1 0.00 0.40 440695.17 2 -0.00 -2.18 18649996.73 3 0.01 14.56 790001058.39 4 -0.05 -94.36 33464645835.46 5 0.32 614.54 1417571655440.93
Note that with eta = 1e-6 gradient descent is diverging! Try different values for eta and a much larger T, and see how small you can make MSE (optional).
Compute Betas
Instead of using gradient descent, we can compute the closed-form solution for the parameters directly. For ordinary least-squared in 1D, this is
where and
.
This function returns the calculated betas and their corresponding MSE in a tuple, as (beta_0, beta_1, MSE).
And no, I’m not going to tell you what the answers are. Do the math.
Predict Ice Cover
Using your closed-form betas, predict the number of ice days for a future year. Return that value.
For example:
>>> predict(2021) => 85.85
Food for thought (you don’t need to write this up or turn it in, but if you want to discuss it on Piazza please do):
- What’s the prediction for next year? five years from now?
- Which year will the predicted number of ice days become negative?
- What does this say about our model?
Normalized Gradient Descent
Can’t get your iterating gradient descent to match the closed-form solution? You’re not alone! The culprit is the scale of input x, compared to the implicit offset value 1 (think ). Gradient descent converges slowly when these scales differ greatly, a situation known as bad condition number in optimization.
For this function, first normalize your x values (NOT the y values):
then proceed exactly as in iterate_gradient()
:
>>> iterate_normalized(5, 0.1) 1 20.44 -1.85 7036.41 2 36.79 -3.33 4609.88 3 49.88 -4.52 3056.86 4 60.34 -5.47 2062.90 5 68.72 -6.23 1426.75 >>> iterate_normalized(5, 0.01) 1 2.04 -0.18 10410.73 2 4.05 -0.37 10010.20 3 6.01 -0.54 9625.53 4 7.93 -0.72 9256.08 5 9.82 -0.89 8901.27
With you should get convergence within 100 iterations. (Note that the betas are now for the normalized version of x, but you can translate them back to the original x with a little algebra. This isn’t required for the homework.)
Stochastic Gradient Descent
Now let’s have some fun with randomness and implement Stochastic Gradient Descent (SGD). With everything the same as part 8 (including normalization), modify the gradient as follows: in iteration t, randomly pick ONE of the n items (call it ), and approximate the gradient using only that item.
Print the same information as in the previous function. For example (your results WILL differ because of randomness in the items selected):
>>> sgd(5, 0.1) 1 17.60 14.00 7993.47 2 36.50 24.68 5760.88 3 51.86 8.92 3160.46 4 70.23 -17.22 1380.68 5 82.74 -6.49 682.61
With you should approximately converge within a few hundred iterations.
Since n is small in our dataset, there is little advantage of SGD over standard gradient descent. However, on large datasets, SGD becomes much more desirable…
Submission
Please submit your code in a file called ice_cover.py
. All code should be contained in functions or under a
if __name__=="__main__":
check so that it will not run if your code is imported to another program.
Reviews
There are no reviews yet.