Assignment 1 Reading Rectangular/Tabular data
Not so simple!
STA141B Fall 2020
Professor Duncan Temple Lang Due: Oct 12, 5pm
Submit via Canvas
This appears to be a simple assignment read rectangular/tabular data from a file. It is simple, but has some twists relative to reading a simple CSV (comma-separated value) file. It is a real world case study, using data from the US Federal Election Committee.
Visit the Web page https://www.fec.gov/data/browse-data/?tab=bulk-data. Click on the Contributions by individuals panel. Youll see something like
Figure 1: FEC Bulk Data Web Page
Download the file from the 2019-20 link.
Read the contents of the file into an R data.frame.
Programmatically read the column names from the associated CSV file from the Header file link or from the HTML
in the Data description for this file.
Plot the number of contributions by date
Plot the number of contributions by state
Plot the number of contributions by state per capita (i.e., adjust by the state population)
Explore other aspects you think may be interesting and say why they are interesting and summarize the key features
and findings.
Bonus: How do the number and amount of contributions in this years election campaign compare to those from the 2016 presidential election.
The data is reasonably large. If your computer cannot deal with all of the rows, read the largest subset you can using the nrows parameter of read.table. Indicate how many rows you read.
1
Also, identify some of the limitations/issues with reading just the first n rows of the file.
One of the learning goals of this is assignment is for you to have to fill in the missing details, ranging from vague and general instructions, to unexpected complications. This involves asking questions, but being specific and precise, and trying to figure things out before asking, or asking questions that go beyond what you have already been told, i.e., taking initiative. The assignment also requires debugging and sleuthing to figure out what is going wrong, or more specifically where your implicit assumptions are incorrect.
Another goal is to automate the computations and be able to reproduce them programmatically. This means writing code to do the computations, not manually specifying details such as the names of the columns. This allows others to reproduce the results and understand what was actually done, but also reduces errors.
You will also likely need to read the documentation for the functions you use to see how to specify and control some of their non-default behaviors.
I will share with you guidelines for how to write your report separately.
Potentially useful Functions
read.table(), read.csv(), plot(), density(), as.Date(), download.file(), XML::readHTMLTable().
2
Reviews
There are no reviews yet.