Homework 6 : Analyze Celebrity Deaths
(Deadline as per Canvas)
This homework deals with the following topics:
-
The pandas module
-
Loading data
-
Joining data
-
Querying data
-
Summarizing data
-
Aggregate functions
-
The numpy library
-
The matplotlib library
-
Data visualization
General Idea of the Assignment
In this assignment, you will analyze data from the file “celebrity_deaths_2016.xlsx” which contains records of deaths of famous humans and non-humans in 2016. You’ll use functions from the pandas module for loading, inspecting and querying data. You are expected to summarize data, create pivot tables and apply aggregate functions, and to visualize data using histograms and other kinds of plots.
For each question, there are clear instructions in each cell. Follow those instructions and write the code after each block of:
# your code here
Please use the exact variable name if it is specified in the comment.
About the Data
All of the data is contained within the “celebrity_deaths_2016.xlsx” file which contains 2 sheets:
-
“celeb_death”: contains records of deaths of famous humans and non-humans
-
There are 5 columns: date_of_death, name, age, bio, cause_id
-
-
“cause_of_death”: contains the causes of the deaths
-
There are 2 columns: cause_id, cause_of_death
During this exercise, you’ll need to merge the “celeb_death” data with the “cause_of_death” data using the “cause_id” column. This will give you the cause for each death.
Other information about the dataset:
-
-
The cause of death was not reported for all individuals
-
The dataset might include deaths that took place in other years (you’ll need to ignore these records)
-
The dataset might contain duplicate records (you’ll need to remove them)
Submission
To complete the assignment, download celebrity_deaths_2016.ipynb and
celebrity_deaths_2016.xlsx.
Evaluation
Two points for each question.
Reviews
There are no reviews yet.