1) (18 points) This question involves the use of multiple linear regression on the redwine
(winequality-red.csv) data set available on Canvas in the Datasets for Assignments module.
This is the same dataset used in Assignment 2.a. (6 points) Perform a multiple linear regression with pH as the response and all other
variables except citric_acid as the predictors. Show a printout of the result (including
coefficient, error, and t-values for each predictor). Comment on the output by answering
the following questions:i) Which predictors appear to have a statistically significant relationship to the response?
How do you determine this?
ii) What does the coefficient for the free_sulfur_dioxide variable suggest, in simple terms?b. (6 points) Produce diagnostic plots of the linear regression fit. Comment on any problems
you see with the fit. Do the residual plots suggest any unusually large outliers? Does the
leverage plot identify any observations with unusually high leverage?c. (6 points) Fit at least 3 linear regression models (exploring interaction effects) with alcohol
as the response and some combination of other variables as predictors. Do any interactions
appear to be statistically significant?2) (30 points) This problem involves the Boston data set, which can be loaded from library MASS
in R and is also made available in the Datasets for Assignments module on Canvas
(boston.csv). We will now try to predict per capita crime rate (crim) using the other variables
in this data set. In other words, per capita crime rate is the response, and the other variables are
the predictors.a. (6 points) For each predictor, fit a simple linear regression model to predict the response.
Include the code, but not the output for all the models in your solution.
b. (6 points) In which of the models is there a statistically significant association between the
predictor and the response? Considering the meaning of each variable, discuss the
relationship between crim and each of the predictors nox, chas, rm, dis and medv. How do
these relationships differ?c. (6 points) Fit a multiple regression model to predict the response using all the predictors.
Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0?
d. (6 points) How do your results from (a) compare to your results from (c)? You can present
this comparison as a plot or as a table or any other form of comparison you deem fit.e. (6 points) Is there evidence of non-linear association between the predictors age and tax
and the response crim? To answer this question, for each predictor (age and tax), fit a model
of the form:
Y = β0 + β1X + β2X
2 + β3X
3+ ε
Hint: use the poly() function in R. Use the model to assess the extent of non-linear
association.3) (12 points) Suppose we collect data for a group of students in a statistics class with variables:
X1 = hours studied,
X2 = undergrad GPA,
X3 = PSQI score (a sleep quality index), and
Y = receive an A.We fit a logistic regression and produce estimated coefficient, β0 = −8, β1 = 0.1, β2 = 1, β3 = -.04.
a. (4 points) Estimate the probability that a student who studies for 32 h, has a PSQI score of
11 and has an undergrad GPA of 3.0 gets an A in the class. Show your work.b. (4 points) How many hours would the student in part (a) need to study to have a 65 %
chance of getting an A in the class? Show your work.c. (4 points) How many hours would a student with a 3.0 GPA and a PSQI score of 3 need to
study to have a 60 % chance of getting an A in the class? Show your work.
475/575:, Assignment, CptS, Data, Linear, Logistic, Part, Regression, Science, solved
[SOLVED] Cpts 475/575: data science assignment 5 – part 1: linear regression & logistic regression
$25
File Name: Cpts_475_575__data_science_assignment_5_____part_1__linear_regression___logistic_regression.zip
File Size: 857.22 KB
Reviews
There are no reviews yet.