HW5: Temperature Anomaly
Note: The lectures on files, data files, and lists in units 5 and 6 along with the corresponding readings contain the background for this assignment.
This assignment will be graded for Coding Style. and Best Practices. Comment your submissions in your own words.
In this assignment, we will look at climate change data from the National Oceanic and Atmospheric Administration. In particular, we will look at temperature anomaly over the last century. By “anomaly” the NOAA just means the difference between the annual temperature and the long term 20th century average temperature. The figure below from a recent article illustrates this data for Northern California. They used a popular statistical technique, a moving average (https://en.wikipedia.org/wiki/Moving_average) , to filter out short term noise in the data so that the longer term trend is more apparent.
Average July minimum temperatures (overnight lows) in California (light orange line) from 1895–2018. The trend over the historical record is shown in dark orange, and the recent trend (2000-2018) is shown in red. The twentieth-century average is shown with a gray line. NOAA Climate.gov graph, based on data from NCEI’s Climate at a Glance (https://www.ncdc.noaa.gov/cag/global/time-series) .
For this project you will analyze publicly available CSV formatted data for local July temperatures going back to the year 1880. First, you will write a program to compute basic statistics about the temperatures in the file. Then, you will write a program to compute a moving average and write that result to another CSV formatted file for plotting in a spreadsheet.
We have broken this project down into separately graded incremental steps so you can iteratively develop and test each part.
Part 1: (15 points) Opening and Reading the Temperature File
July temperature anomaly data for Sacramento 1880-2018 plotted using Google sheets.
Download the temperature anomaly data file for our region from Canvas:
SacramentoTemps.csv (https://canvas.ucdavis.edu/courses/911230/files/25030602?wrap=1) (https://canvas.ucdavis.edu/courses/911230/files/25030602/download?download_frd=1) . It’s in the Homework5 folder under Files. Gradescope will also be testing your programs using a temperature anomaly data file for the Northern Hemisphere NorthernTemps.csv (https://canvas.ucdavis.edu/courses/911230/files/25030605?wrap=1) (https://canvas.ucdavis.edu/courses/911230/files/25030605/download?download_frd=1) . You should look at these files in a text editor to see the format, rather than a spreadsheet which will import the data obscuring and possible changing the format. Here are the first lines of SacramentoTemps.csv:
Year,Value
1880,-1.56
1881,-0.08
1882,-0.30
1883,-1.44
Your program will get a file name from the user and open it for reading. The first line is a column header that your program should read but ignore. Your program will then loop through the remaining lines of the file. In the block under the loop that reads the file, use the strip() string method to remove the newline character after each line of input and the split() string method to split on the comma to extract the year and the temperature into separate string variables. To test your code, add a print statement to print out the year and the temperature. The temperature should be changed to a floating point number to remove the trailing zeros and match the output expected by Gradescope. The values should be separated by a space instead of the original comma.
The first lines of your programs output should look exactly like this:
Temperature anomaly filename:SacramentoTemps.csv
1880 -1.56
1881 -0.08
1882 -0.3
1883 -1.44
Submit your completed program to Gradescope as read_temp_file.py
Part 2 (20 points) Outlier Temperatures
We are interested in finding outlier temperatures and comparing them to the long run average.
Start a new program temp_file_stats.py from your working version of read_temp_file.py. Comment out the line that prints out the year and temperature for each year in the file.
Add code to find the minimum and maximum temperature for all of the years in the file and the years they occured in. You can base your solution on the starbucks_menu.py program we went over in lecture. Here is a transcript. of how the program should work:
Temperature anomaly filename:SacramentoTemps.csv
Min temp: -2.32 in 1913
Max temp: 2.99 in 1889
Submit the completed program to Gradescope as temp_file_stats.py
Part 3 Reading the data into a list (15 points)
You are surprised that the hottest year on record in Sacramento was more than a century ago! You decide that you need to compute a moving average to observe the long term trend in the data.
Start a new program temp_list.py from your working version of read_temp_file.py. Comment out the line that prints out the year and temperature for each year in the file, and instead add the floating point temperatures one at a time to a growing Python list inside the for loop. Test your code by printing the list the loop that reads the file.
Below is an example of how your programs output should look:
Temperature anomaly filename:SacramentoTemps.csv
[-1.56, -0.08, -0.3, -1.44, …, -0.06, -0.4, 0.48, 2.63, 0.18]
Submit the completed program to Gradescope as temp_list.py
Part 4 (15 points) Moving average part 1: first window
Start a new program first_ave.py from your working version of temp_list.py. Comment out the line that prints out the list of temperatures.
For the moving average, we will let the user enter an integer k, then, for each year in our data file we will calculate the average of the k years before, the year itself, and the k years after that year. We can visualize this process as a window that slides across our temperature data.
In the figure below the window is represented by the curly brackets. The window is centered on an index in the temperature list. We are interested in calculating the average temperature for the values in the window and the year on which the window is centered. The window slides across the temperature list, and at each new position we calculate the average and the year.
Before starting the moving average calculation, you will first get the window size parameter k from the user. You may assume that the user will always enter a valid integer between 0 and 60. For example:
Enter window size:20
You can see in the figure that the moving average calculation is not valid for years near the ends of the list. We must have at least k years before and k years after the year we are computing the average for. In the first part we will focus on just calculating the average and the year for the first window.
Prototype the Calculation
We will start with implementing the first window to make sure the calculation works before moving to the more difficult problem of sliding the window.
For the first valid year in your temperature list, you will calculate the average of the k years before, the year itself, and the k years after. The easiest way to do this is using a list slice. The following three statements compute and print the year and the moving average for the first valid index in a list of temperatures called temps.
index = k
year = 1880 + index
ave = sum(temps[index-k:index+k+1]) / (2*k+1)
You should be able to insert these statements at the end of your program to calculate the year and the average. The output of your program for a window size of 20 will eventually look exactly like this:
Temperature anomaly filename:SacramentoTemps.csv
Enter window size:20
1900,-0.4171
Once you get the calculations for year and average working, format the output so that the values are separated by a comma and the average temperature is printed with exactly four decimal places using the format string method.
Submit the completed program to Gradescope as first_ave.py
Part 5 (10 points) Moving average part 2: slide the window.
For this part, we move the window. Start a new program moving_ave.py from your working version of first_ave.py
We must have at least k years before and k years after the year we are computing the average for. You should write a loop that only calculates the average for valid list indices. This loop is after the loop that reads the data from the file into a list. The iteration variable will “move” from the lowest valid year to the highest valid year. In the body of the loop the program should calculate and print the year and the moving average. In our example, the iteration variable is called index, the lowest valid list index is k, and the highest valid list index is len(temps) – 1 – k.
Below is an outline of how the sliding window can be implemented using a for loop. The call to the range function needs to be filled in so that it returns the sequence of integers from k to len(temps) – 1 – k. The calculation for year and ave given for the previous problem can be generalized (they should both be functions of index) and indented in the loop body to calculate the year and the moving average.
# loop slides the window from index k to len(temps) – 1 – k
# for each index we calculate the corresponding year and
# the average of the elements from temps[index-k] to temps[index+k] inclusive
for index in range(___________):
# calculate year from index
# calculate average for the window centered at index
# print year,average
Here is a short example of how the output from your program look when the user chooses to average over a really long time scale of k = 60 years:
Temperature anomaly filename:SacramentoTemps.csv
Enter window size:60
1940,-0.2331
1941,-0.2169
1942,-0.2150
1943,-0.2228
1944,-0.2107
1945,-0.1796
1946,-0.1667
1947,-0.1582
1948,-0.1585
1949,-0.1492
1950,-0.1711
1951,-0.1688
1952,-0.1490
1953,-0.1556
1954,-0.1548
1955,-0.1580
1956,-0.1420
1957,-0.1101
1958,-0.1017
Submit the completed program to Gradescope as moving_ave.py
Part 6 (15 points) Create a .csv file
Start a new program moving_ave_csv.py from your working version of moving_ave.py.
For the last program we will output a valid CSV file which you will always call MovingAve.csv.
Change your program so that instead of printing the values to the screen (comment out the print statement) it writes the values to an output file that includes a simple one line header. For simplicity don’t ask the user for the output file name, just call it MovingAve.csv. Finally, add a short column header to the output file: “Year,Value
“.
When you run your program with the following inputs it should produce this output:
Temperature anomaly filename:SacramentoTemps.csv
Enter window size:60
Opening the output file in a text editor should look like this:
Year,Value
1940,-0.2331
1941,-0.2169
1942,-0.2150
1943,-0.2228
:
1955,-0.1580
1956,-0.1420
1957,-0.1101
1958,-0.1017
Submit the completed program to Gradescope as moving_ave_csv.py
Gradescope will look for the output file named MovingAve.csv and include its contents at the end of the expected output.
Part 7 Plotting the output (10 points)
To create a plot, run your program using a value for k equal to your current age on the data file SacramentoTemps.csv and make a nice plot in your favorite program (Excel, Sheets, Numbers, etc). If you have never created a plot before, try Google sheets which was demonstrated in lecture. In Google sheets, import the file MovingAve.csv. Once the file has been imported, select the data select Chart from the Insert menu. This will bring up a chart which you can label accordingly. Your plot should include a title and labels on both axes.
Upload a screenshot, image, or pdf of your plot (from a spreadsheet or your updated program) along with your submission. Name your file Plot. It should have an extension indicating what type of file it is (e.g. Plot.pdf) which should be visible in the upload dialog box.`
Submit the completed image/figure to Gradescope as Plot, Plot.png, Plot.pdf, Plot.jpg, etc. so the TAs can find it.
You will receive points after your submission has been manually graded by a TA. You will receive full credit if your figure includes the correct data, a title and labels on both axes.
Part 8: Extra Challenge (2 points) (Updated)
We posted a Jupyter notebook to on Canvas that will be covered in discussion sections next week. It includes examples of using the Python plotting library Matplotlib. Start a new program plot_moving_ave.py from your working version of moving_ave.py. Adapt the code in the time series example to automatically display the result of the moving average as a plot (see orange line below). If you cannot install Matplotlib on your own computer, you can test your program on Google Colab. Lookup the syntax for Matplotlib’s plot function so that you can plot both the moving average smoothed (orange) and the original non-smoothed data (blue). Your result should look similar to the example below.
Note that there are actually two plots in the figure above. The smoothed temperatures are in orange and the original temperatures are in blue. In the example above, the original temperatures were only plotted for years where a smooth result could be computed, but plotting all data from the file is another option. Plotting two lines independently can be done with four lists (x1, y1, x2, y2) representing the x and y coordinates of the two lines (smoothed and raw). The syntax for plotting those lists using matplotlib’s plot method is:
plot(x1, y1, x2, y2)
Submit your completed program to Gradescope as plot_moving_ave.py and a screenshot or PDF of example output (using the same value of k from Part 7) as matplotlib.pdf. You will receive points after your submission has been manually graded by a TA.
Reviews
There are no reviews yet.