[Solved] CS1009 Homework 3-Forecasting Bike Sharing Usage

$25

File Name: CS1009_Homework_3_Forecasting_Bike_Sharing_Usage.zip
File Size: 452.16 KB

SKU: [Solved] CS1009 Homework 3-Forecasting Bike Sharing Usage Category: Tag:
5/5 - (1 vote)

You are hired by the administrators of the Capital Bikeshare program (https://www.capitalbikeshare.com) program in Washington D.C., to help them predict the hourly demand for rental bikes and give them suggestions on how to increase their revenue. You will prepare a small report for them.The hourly demand information would be useful in planning the number of bikes that need to be available in the system on any given hour of the day, and also in monitoring traffic in the city. It costs the program money if bike stations are full and bikes cannot be returned, or empty and there are no bikes available. You will use multiple linear regression and polynomial regression and will explore techniques for subset selection. The goal is to build a regression model that can predict the total number of bike rentals in a given hour of the day, based on attributes about the hour and the day.An example of a suggestion to increase revenue might be to offer discounts during certain times of the day either during holidays or non-holidays. Your suggestions will depend on your observations of the seasonality of ridership.The data for this problem were collected from the Capital Bikeshare program over the course of two years (2011 and 2012).Use only the libraries below:

Data Exploration & Preprocessing, Multiple Linear Regression, Subset SelectionOverviewThe initial data set is provided in the file data/BSS_hour_raw.csv . You will add some features that will help us with the analysis and then separate it into training and test sets. Each row in this file contains 12 attributes and each entry represents one hour of a 24-hour day with its weather, etc, and the number of rental rides for that day divided in categories according to if they were made by registered or casual riders. Those attributes are the following:dteday (date in the format YYYY-MM-DD, e.g. 2011-01-01) season (1 = winter, 2 = spring, 3 = summer, 4 = fall) hour (0 for 12 midnight, 1 for 1:00am, 23 for 11:00pm) weekday (0 through 6, with 0 denoting Sunday)holiday (1 = the day is a holiday, 0 = otherwise) weather1: Clear, Few clouds, Partly cloudy, Partly cloudy2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist3: Light Snow, Light Rain + Thunderstorm4: Heavy Rain + Thunderstorm + Mist, Snow + Fog temp (temperature in Celsius) atemp (apparent temperature, or relative outdoor temperature, in Celsius) hum (relative humidity) windspeed (wind speed)casual (number of rides that day made by casual riders, not registered in the system) registered (number of rides that day made by registered riders)General HintsUse pandas .describe() to see statistics for the dataset.When performing manipulations on column data it is useful and often more efficient to write a function and apply this function to the column as a whole without the need for iterating through the elements. A scatterplot matrix or correlation matrix are both good ways to see dependencies between multiple variables.For Question 2, a very useful pandas method is .groupby(). Make sure you aggregate the rest of the columns in a meaningful way. Print the dataframe to make sure all variables/columns are there!Resourceshttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html)Question 1: Explore how Bike Ridership varies with Hour of the DayLearn your Domain and Perform a bit of Feature Engineering1.1 Load the dataset from the csv file data/BSS_hour_raw.csv into a pandas dataframe that you name bikes_df . Do any of the variables ranges or averages seem suspect? Do the data types make sense?1.2 Notice that the variable in column dteday is a pandas object , which is not useful when you want to extract the elements of the date such as the year, month, and day. Convert dteday into a datetime object to prepare it for later analysis.1.3 Create three new columns in the dataframe:year with 0 for 2011 and 1 for 2012. month with 1 through 12, with 1 denoting Jan. counts with the total number of bike rentals for that day (this is the response variable for later).1.4 Use visualization to inspect and comment on how casual rentals and registered rentals vary with the hour .1.5 Use the variable holiday to show how holidays affect the relationship in question 1.4. What do you observe?1.6 Use visualization to show how weather affects casual and registered rentals. What do you observe?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CS1009 Homework 3-Forecasting Bike Sharing Usage[Solved] CS1009 Homework 3-Forecasting Bike Sharing Usage
$25