Additional explanation of your data and data collection process;
Outline your analytical procedure:
You can organize them into subsections and in each subsection, please document the major analytical results. Please also provide brief discussions if any.
group project, data modeling prediction resultmode, regressionclassification, meaningresult, linear regression,
dataset, predictive modeling result, resultpeak season compare;
Dataset download: https:www.kaggle.comusdotflightdelaysflights.csv
Data Cleansing
Load data from airports.csv, flights.csv, airlines.csv by using pandas
import pandas as pd
import numpy as np
import seaborn as sns
import chartstudio.plotly as py
import plotly.graphobjs as go
from plotly import tools
import matplotlib.pyplot as plt
sns.setstylewhitegrid
matplotlib inline
import datetime, warnings, scipy
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.patches import ConnectionPatch
from collections import OrderedDict
from matplotlib.gridspec import GridSpec
from mpltoolkits.basemap import Basemap
from scipy.optimize import curvefit
plt.rcParamspatch.forceedgecolorTrue
plt.style.usefivethirtyeight
mpl.rcpatch, edgecolordimgray, linewidth1
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.astnodeinteractivitylastexpr
pd.options.display.maxcolumns50
warnings.filterwarningsignore
Load tables
airlinespd.readcsv..airlines.csv
airports pd.readcsv..airports.csv
flightspd.readcsv..flights.csv, lowmemoryFalse,encodingutf8.head100000
printflights.shape
flights.DEPARTURETIME.head10
flightsDATEpd.todatetimeflightsYEAR,MONTH, DAY
def getdatetimefromstringtime:
if pd.isnulltime:
return np.nan
else:
if time2400: time0
time0:04d.formatinttime
timedatetime.timeinttime0:2, inttime2:4
timedatetime.datetime.strptimetime, HM.time.strftimeHM
return time
def getdatetimex:
if pd.isnullx0 or pd.isnullx1:
return np.nan
else:
return datetime.datetime.combinex0,x1
def getflighttimeflights, col:
delaylist
for index, cols in flightsDATE, col.iterrows:
if pd.isnullcols1:
delaylist.appendnp.nan
elif floatcols12400:
cols0datetime.timedeltadays1
cols1datetime.time0,0
delaylist.appendgetdatetimecols
else:
cols1getdatetimefromstringcols1
delaylist.appendgetdatetimecols
return pd.Seriesdelaylist
flightsSCHEDULEDDEPARTUREgetflighttimeflights, SCHEDULEDDEPARTURE
flightsSCHEDULEDARRIVALgetflighttimeflights, SCHEDULEDARRIVAL
delayspd.mergeflights, airlines, leftonAIRLINE, rightonIATACODE, howleft
remove neccessary variabledepends on your need
variablestoremoveTAXIOUT, TAXIIN, WHEELSON, WHEELSOFF, YEAR,
MONTH,DAY,DAYOFWEEK,DATE, AIRSYSTEMDELAY,
SECURITYDELAY, AIRLINEDELAY, LATEAIRCRAFTDELAY,
WEATHERDELAY, DIVERTED, CANCELLED, CANCELLATIONREASON,
FLIGHTNUMBER, TAILNUMBER, AIRTIME
delays.dropvariablestoremove, axis1, inplaceTrue
delaysdelaysAIRLINEy, ORIGINAIRPORT, DESTINATIONAIRPORT,
SCHEDULEDDEPARTURE, DEPARTURETIME, DEPARTUREDELAY,
SCHEDULEDARRIVAL, ARRIVALTIME, ARRIVALDELAY,
SCHEDULEDTIME, ELAPSEDTIME
delays:5
delaysdelays.renamecolumnsAIRLINEy:AIRLINE
Data Description:
IATACODEobject
AIRLINE object
dtype: object
14, 2
IATACODE object
AIRPORTobject
CITY object
STATE object
COUNTRYobject
LATITUDE float64
LONGITUDEfloat64
dtype: object
322, 7
YEAR int64
MONTHint64
DAY int64
DAYOFWEEK int64
AIRLINE object
FLIGHTNUMBERint64
TAILNUMBER object
ORIGINAIRPORT object
DESTINATIONAIRPORT object
SCHEDULEDDEPARTURE int64
DEPARTURETIME float64
DEPARTUREDELAYfloat64
TAXIOUTfloat64
WHEELSOFF float64
SCHEDULEDTIME float64
ELAPSEDTIMEfloat64
AIRTIMEfloat64
DISTANCE int64
WHEELSON float64
TAXIINfloat64
SCHEDULEDARRIVALint64
ARRIVALTIMEfloat64
ARRIVALDELAY float64
DIVERTED int64
CANCELLEDint64
CANCELLATIONREASON object
AIRSYSTEMDELAYfloat64
SECURITYDELAY float64
AIRLINEDELAY float64
LATEAIRCRAFTDELAYfloat64
WEATHERDELAY float64
dtype: object
5819079, 31
Reviews
There are no reviews yet.