MANIPULATING AND EXPLORING HONG KONG PROPERTIES DATA
Part I: Creating project
Create a selfcontained project in RStudio refer to the lecture PPT Data, project, and plotting
PPT Data, project, and plottingPPTPPTnew project
Create a new script under the src subdirectory in the project folder, and write all commands you use for this assignment in this script.
R src
Part II: Data preparation
Download the dataset HKproperties.xlsx, and save it into the subdirectory named data into the project folder
HKproperties.xlsxdata
Use R statement to import the data to a data frame called hkp. Do NOT import character fields as factors o You may use the parameter stringsAsFactorsFALSE
Use R statement to check the first 6 lines of hkp to make sure it was imported correctly.
Part III: Data manipulationanalytics using R commands
Add a new row to hkp: id6 Kowloon Tong,Block 3, blockMiddle Floor, directionSouth, hkd.m10.2, year2007, room3, gross.area716; and set the rest columns to NA.
Hint: a viable approach is to use rbind function
To check if your new row is added, print the last 10 rows of hkp.
Check the data types of columns. If some columns have wrong data types, correct it.
Hint: columns 511 are supposed to be numeric, otherwise, calculations cannot carry on!
Make a new data frame called hkp1996 that contains all columns of hkp, but only for years from 1996 forward include 1996. You are doing this because only in 1996 and after do all of the floor types have data
Put 4 columns of hkp1996, hkd.m, gross.area, price.ft.sq, and efficiency.ratio, into a new data frame named hkp1996.value.
Find the medians of hkd.m, gross.area, price.ft.sq, and efficiency.ratio in hkp1996.value using one single line of R code.
Hint: you may use apply with median functions; remember to use parameter na.rmT to cope with NA.
Calculate the mean for price.ft.sq at every floor type in hkp1996. Put it into a data frame called hkp1996mean.
hint: use the aggregate function or ddply function; watch out for NA values.
Repeat the previous step for hkp, and place the result in an object called hkpMean
Rename the two column names of hkp1996mean to be floor and price.1996.mean; rename the two column names of hkpMean to be floor and price.mean
Then, add a column to hkpMean called price.1996.mean that contains the means calculated in hkp1996mean.
In hkpMean, create another column in hkpMean called mean.diff that is the difference between price.1996.mean and price.mean i.e., price.1996.meanprice.mean
Write hkpMean to an Excel workbook file under the output subfolder.
o Hint: the Excel file looks like this
Part IIII: Simple Chart
Make a scatter plot of price.ft.sq versus gross.area for each year 1996 and later. Save the chart as image into the img subdirectory.
Use the function hist to create a histogram of the column gross.area in the hkp1996 data frame.
Save the charts as images into the img subdirectory.
Deliverables
You need to submit a ziprar package of the entire project directory, which contains:
Your raw dataset HKproperties.xlsx under the projects subfolder named data
Your output dataset hkpMean.xlsx under the projects subfolder named output
Your R script with all the code you used to do the data manipulation, in the subfolder named src
Two charts as images, in the subfolder named img
Submissions
Submit the .ziprar package of the entire project folder hardcopy is not necessary.
Indicate your student number and name.
Reviews
There are no reviews yet.