University of Toronto Mississauga
STA302 Fall 2019 Assignment # 4
Due Date: Tuesday, December 3rd 2019, during lecture.
Last Name / Surname (please print): First Name (please print):
Student Number:
Tutorial Section (circle one):
Instructor: Al Nosedal
T0101 18-19 Julian Braganza
INSTRUCTIONS and POLICIES:
Answer each of the questions.
Please, attach a printed version of your code and plots to your assignment. Failure to provide a printed copy of R code and graphs may result in receiving no marks.
Recall that missed assignments earn a mark of zero; no exceptions.
Medical certificates and/or other valid documentation are not accepted.
Late submissions will not be accepted. A hard copy of the assignment should be handed during lecture on the due date email submissions are not accepted.
Question
1
2
TOTAL
Value
20
0
20
Mark Earned
GOOD LUCK !
STA302 Fall 2019: Assignment # 4 Page 1 of 3
Problem 1. (20 marks)
The general manager of the Cleveland Indians baseball team is in the process of determin- ing which minor-league players to draft. He is aware that his team needs home-run hitters and would like to find a way to predict the number of home runs a player will hit. Being an astute statistics practitioner, he gathers a random sample of players and records the number of home runs each player hit in his first two full years as a major-league player (y), the number of home runs he hit in his last full year in the minor leagues (x1), his age (x2), and the number of years of professional baseball (x3).
Dataset is available at
draft_url = https://mcs.utm.utoronto.ca/~nosedal/data/baseball-draft.txt
Use R to answer questions a), b), c), and d). You have to show all your work to get full credit. Answers, even if correct, with no justifications will not receive any marks.
a) Using R, develop a regression model.
b) How well does the model fit?
c) Test the models validity.
d) Determine whether the required conditions are satisfied. e) Interpret each of the coefficients.
STA302 Fall 2019: Assignment # 4
Page 2 of 3
A few comments/suggestions
Check the Linearity Condition with scatterplots of the y-variable against each x- variable.
If the scatterplots are straight enough, fit a multiple regression model to the data. Otherwise, either stop or consider re-expressing an x-variable or the y-variable.
Find the residuals and predicted values.
Make a scatterplot of the residuals against the predicted values (and ideally against each predictor variable separately). These plots should look patternless. Check, in particular, for any bend and for any thickening. If theres a bend, consider re- expressing the y and/or the x variables. If the variation in the plot grows from one side to the other, consider re-expressing the y-variable. If you re-express a variable, start the model fitting over. Here is a brief list of the most commonly used transformations.
1. Log transformation: y = log(y) (provided y > 0). The log transformation is used when a) the variance of the error variable increases as y increases or b) the distribution of the error is positively skewed.
2. Square transformation: y = y2. Use this transformation when a) the variance is proportional to the expected value of y or b) the distribution of the error variable is negatively skewed.
y.
4. Reciprocal Transformation: y = 1/y. When the variance appears to significantly increase when y increases beyond some critical value, the reciprocal transformation is recommended.
If the conditions check out this far, feel free to interpret the regression model and use it for prediction.
Make a histogram and Normal probability plot of the residuals to check the Normal Condition.
Problem 2. (ZERO marks)
Watch Moneyball. Based on a true story, Moneyball is a movie for anybody who has ever dreamed of taking on the system. Brad Pitt stars as Billy Beane, the general manager of the Oakland As and the guy who assembles the team, who has an epiphany: all of baseballs conventional wisdom is wrong.
3. Square-Root Transformation: y =
y (provided that y 0). The square-root transformation is helpful when the variance is proportional to the expected value of
STA302 Fall 2019: Assignment # 4 Page 3 of 3
Reviews
There are no reviews yet.