, , , ,

[SOLVED] EN553413-613 Spring 2024 EN553413-613 Spring 2024 Exam 1 SQL

$25

File Name: EN553413-613_Spring_2024_EN553413-613_Spring_2024_Exam_1_SQL.zip
File Size: 565.2 KB

5/5 - (1 vote)

Applied Stats and Data Analysis

EN.553.413-613, Spring 2024

Feb 21, 2024

Exam 1

Question 1 (18 pts). The following TRUE/FALSE questions concern the Simple Linear

Regression model

Yi = β0 + β1Xi + εi
, E(εi) = 0, V ar(εi) = σ
2
, cov(εi
, εj ) = 0, for i = j.

(a) TRUE or FALSE. For the least squares estimates b0, b1 we require the errors to be

normally distributed.

(b) TRUE or FALSE. The estimated mean of the response variable at Xi
is defined as

b0 + b1Xi
.

(c) TRUE or FALSE. One of the Gauss Markov conditions is P n
i=1 ei = 0.

(d) TRUE or FALSE. Plotting e
2
i vs Yˆ
i
is one of the diagnostic plots.

(e) TRUE or FALSE. QQ plot of the Yi
’s is one of the diagnostic plots.

(f) TRUE or FALSE. Low R2 means that X and Y are not related.

(g) TRUE or FALSE. The s
2
is an estimate of the variance of Yi
.

(h) TRUE or FALSE. Coefficient of simple determination R2 measures the proportion of the

explained variation in Y over the unexplained variation in Y .

(i) TRUE or FALSE. In the Correlation model of the regression Xi
’s are random variables.

Question 2 (18 pts). Let X, Y, Z ∼ iid N(0, 1), i.e. they are independent, identically

distributed standard normal random variables. For the following random variables state

whether they follow a normal distribution, a t- distribution, a χ
2 distribution, an F

distribution, or none of the above. State relevant parameters (e.g. degrees of freedom,

and means and variances for normal RVs)

(a) 3Y − Z

(b) X + Y + Z.

(c) X2 + Y
2 + Z
2
.

X2 + Y
2

(d)

2Z2

X2

(e) √
Y
2 + Z2

(X + Y )
2

(f)

2

Question 3 (20 pts). Suppose a data set {(Xi
, Yi) : 1 ≤ i ≤ n} is fit to a linear model of

the form.

Yi = β0 + β1xi + εi

where εi are independent, mean zero, and normal with common variance σ
2
. Here we treat

Y as the response variable and X as the predictor variable. The output of the lm function

is given. Some values are hidden by ‘XXXXX’. We provide you with additional value:

X¯ = 1.11.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.9412 0.4593 4.226 0.000508 ***

x 0.7042 0.3697 1.905 0.072911 .

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9221 on 18 degrees of freedom

Multiple R-squared: 0.1678, Adjusted R-squared: —–

F-statistic: XXXXX on XX and XX DF, p-value: XXXXXX

(a) (2 points) How many data points are there (what is n, the sample size)? What is the

estimated mean of the response variable Y at Xh = 2 for this dataset?

(b) (3 points) Based on all of this output, do you reject H0 : β1 = 0 in favour of Ha : β1 = 0

at level α = 0.05 significance? What does the test tell us about the relationship between

X and Y ?

(c) (3 points) Based on all of this output, do you reject the H0 : β1 = 0 vs Ha : β1 > 0 at

level α = 0.05 significance? Briefly explain why, or why not.

(d) (4 points) The degrees of freedom, the p-value and the value of the F statistic are hidden.

Is it possible to reconstruct all of them based on the data shown? Recover as many values

as you can.

(e) (4 points) Based on the data above find SSTo, SSR and SSE. Hint: Residual standard

error may be useful here.

(f) (4 points) Find the 95% confidence interval for the mean of the response function at

Xh = 2. Write your answer in the form. A ± B · t(C, D), specify values A, B, C, D as

precise as you can (i.e. find values of as many terms as you can).

Question 4 (14 pts). Consider the following diagnostic plots for two models (Model 1 and

Model 2). Two simple linear regression models Y = β0 + β1X + ε are fitted to the two

different datasets (X, Y ) observations of each Model. For each model 3 diagnostic plots are

shown: plot of Yi vs Xi
, plot of semi-studentized residuals e

i versus fitted values Yˆ
i
, QQ-plot

of the semi-studentized residuals e

i
.

(a) (5 points) What is the main issue do you diagnose with the Model 1, if any? Why?

Which plot was the most useful in diagnosing this problem? Be as specific in describing

the issue as you can.

(b) (5 points) What is the main issue do you diagnose with the Model 2, if any? Why?

Which plot was the most useful in diagnosing this problem? Be as specific in describing

the issue as you can.

(c) (4 points) This question is unrelated to the above plots. Explain in what cases the

transformation of the predictor variable X is more appropriate than the transformation

of the response variable Y .

Question 5 (20 points). For the dataset of n = 200 observations a simple linear regression

model Yi = β0 + β1Xi + εi
is fit. The following estimates are obtained.

b0 = 2, b1 = 1

We have listed additional information here

(a) (2 points) What is the estimated variance s
2 of the error term based on the data above?

(b) (3 points) Find a 90% confidence interval for β1. Write it in the form. A ± B · t(C, D),

compute values of A, B, C, D if possible.

(c) (4 points) Find the joint confidence intervals with confidence at least 90% for β0, β1 in

the form. Ai ± Bi
· t(Ci
, Di). Compute values of Ai
, Bi
, Ci
, Di
if possible. Without any

computation how does the interval for β1 for this part compare to the one in part (a)?

(d) (4 points) Find the joint confidence intervals using Bonferroni procedure with confidence

at least 90% for the mean of the response variable at Xh = 2 and Xh′

= 0. Find it in

the form. Ai ± Bi
· t(Ci
, Di).

(e) (4 points) Set up a General Linear Test for the data provided: specify the reduced and

full model, compute the value of the F-statistic, specify its distribution under the null

hypothesis.

(f) (3 points) An Aspiring Data Scientist (ADS) noticed that one of the observed data points

(Xi
, Yi) = (2, 15) lies outside of the 99% Working-Hotelling band (we assume everything

was computed correctly). They claim it is an issue. Briefly justify if their concern is

correct or not.

Question 6 (20 points). Suppose Yi
follows the model

Yi = βXi + εi

where εi
is independent, identically distributed N(0, σ2
). Note, there is no intercept term.

You observe a collection {(Xi
, Yi)} of data from this model, i = 1, . . . , n.

(a) (5 points) Write the objective function to be minimized and the equations that need to

be solved to get the least squares estimate of β.

(b) (5 points) Solve the equation in (a) and express the answer as a linear combination of

Yi
’s.

(c) (5 points) What is the distribution of b? Find the mean, variance. Justify your steps

(d) (5 points) Write the log-likelihood that needs to be maximized to obtain the estimate of

β. DO NOT MAXIMIZE IT.

(a) Function to be minimized

Equations to be solved:

(b) Solving the equation:

(c) Since b =
P

i

ciYi
, a linear combination of normal RVs, it will be a normal RV itself.

The mean is

The variance is

We have shown that

(d) The log-likelihood is

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] EN553413-613 Spring 2024 EN553413-613 Spring 2024 Exam 1 SQL
$25