5/5 - (1 vote)

Homework #3

Background on dataset:I pulled a dataset from the US Census Bureau that provides County -level results for percent voting for Bill Clinton in 1992 Presidential Election and Demographic variables. I have provided the raw data and a very brief, basic description of the data. The datasets name is clinton1.dat and the description of the data is called Clinton data description.txt (its easier to read this file if you open it with Word or WordPad).

As basic background on US elections: A county is a geographic region within a State, which a larger semi-autonomous region in the United States. There are 50 states in the United States. When voting occurs in the US, it is conducted on the county-level, and then it is aggregated to the state level. Thus, a candidate generally either wins or loses a county. This was actual election data for when Bill Clinton first ran for US president in 1992. You can see there are some interesting patterns on the demographics of who voted for him (as counties) and who did not.

I have intentionally modified this data to make some messy parts for you. For example, there is some missing data and a few data entries that conflict with the format the column should be in.

I want you to do the following (you can review the key parts of session 05 and session 06 for code and materials that will help you with this):

Use standard Python I/O to read in the data as a standard text file without a header. You need to use your best judgement to figure out what data type you should use for each variable/column.

Write a series of Python code that will data mung it and clean it up so it is ready for analysis. For example, figure out what to do with any bad data. Figure out what to do with missing values. Do not do this portion using Pandas and do not fix anything by hand; all fixing must come from your Python code.

Once you have fully cleaned up and prepared the data, write it out to a new CSV file

Create a Panda dataframe and pull the data into it from the CSV file so that you can use a Panda dataframe object for further analysis
Provide basic descriptive statistics on it through Panda dataframe object

Examine the shape of the data and its appropriateness for analysis, by showing histograms and boxplots on its distributions through Panda dataframes and matplotlib.

Decide if any of the data needs to be transformed to do further regression analysis; if so, do the data transformation and justify whey you did it.

Run at least two different kinds of basic analyses on any portions of the data you are interested in (correlation, regression, ANOVA, or something more complex), and interpret the results.

Please turn in all of your code in order of running in one Word file (do not use Adobe). You can write up your results and interpretations at the end of the file.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] python pandas Homework #3

Reviews

Whatsapp Us

[SOLVED] python pandas Homework #3

Reviews

Related products

[SOLVED] COP 3223 Program #2: P2 Lottery

[Solved] Modularized Body Mass Index (BMI) Program in Python

[Solved] Python Assignment-Financial Products and Markets

[SOLVED] Naughty Receiver – Reliable Data Transfer

[SOLVED] ITEC136 Python Program

[Solved] Payroll calculation program-Python