[SOLVED] 代写 STAT 614 Week 13: General Strategy for Model Building

30 $

File Name: 代写_STAT_614_Week_13:_General_Strategy_for_Model_Building.zip
File Size: 565.2 KB

SKU: 2665373464 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


STAT 614 Week 13: General Strategy for Model Building
Richard Ressler 2019-11-20
1

Learning Outcomes
• Develop and apply a General Strategy for variable selection in multiple linear regression.
• References: Chapter 12
2

Seven (+1) Step General Strategy for Variable Selection
1. Identify Objectives and Questions of Interest
2. Screen the available variables: Identify candidate variables
3. Exploratory data analysis: Look for Relationships, Assumptions, Correlations
4. Transformations based on EDA.
5. Fit a rich model and look at residuals – Iterate
6. If appropriate, use a computer aided technique to choose a suitable subset of explanatory variables.
7. Proceed with analysis with chosen explanatory variables.
8. Communicate Results clearly
3

Step 1: Identify Objectives and Questions of Interest
• Example 1: Interested in association of one explanatory variable and one response.
• Goal is to determine the association after adjusting for other variables.
• Perform variable selection with everything except the explanatory variable of interest,
• Once the model is selected, then include the varaiable of interest to test for the association.
4

Step 1: Identify Objectives and Questions of Interest
• Example 2: Just want to fish for associations
• Iterate through adding/removing variables, making transformations, checking residuals, until you develop a model with significant terms and no major issues.
• p-values/confidence intervals don’t have proper interpretation. • Same problems with multiple comparisons — ran many tests
and looked at data a lot to come to final model. • You generally build a model and tell stories with it.
5

Step 1: Identify Objectives and Questions of Interest
• Example 3: Prediction
• Include variables to maximize predictive power, don’t worry about interpretation.
6

Step 2: Screen Available Variables
• Choose a list of explanatory variables that are important to the objective.
• Screen out redundant variables
7

Problems with Including Too Few Variables
• You are only picking up marginal associations.
• E.g., we already know men make more money than women. We want to see if men still make more money than women when we control for other variables.
• Predictions are less accurate.
8

Too few variables: Predictions are less accurate
Prediction Intervals with X
1.2
0.8
0.4
0.0
0.00 0.25
0.50 0.75 1.00
x
9
y

Too few variables: Predictions are less accurate
Prediction Intervals without X
1.2
0.8
0.4
0.0
0.00 0.25
0.50 0.75 1.00
x
10
y

Too few variables: Predictions are less accurate
Prediction Intervals without X
1.2
0.8
0.4
0.0
11
y

Problems with too many variables
• Harder to estimate more parameters. Model tends to overfit.
• Formally, the variances of the sampling distributions of the
coefficients in the model will get much larger.
• Including highly-correlated explanatory variables will really increase the variance of the sampling distributions of the coefficient estimates.
• Intuitively, we are less sure if the association of Y and X1 is due to that actual associate or is it mediated through X2?
• Predictions are less accurate.
12

Demonstration of adding a highly-correlated variate
X1 and X2 are highly correlated
1.00 0.75 0.50 0.25 0.00
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
0.00 0.25 0.50 0.75 1.00
x1
13
x2

Truth
• True model: μ(Y |X1) = X1 + ε
• Fit Model: μ(Y |X1, X2) = β0 + β1X1 + β2X2 + ε
• Correlation between X1 and X2 is 0.9994352.
• We will simulate new Y s and plot the resulting OLS estimates.
14

Demonstration: Black is true β1
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
0.9
0.6
0.3
0.0
0.00 0.25 0.50 0.75 1.00
x1
15
y

Demonstration: Black is true β1
0.9
0.6
0.3
0.0
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
0.00 0.25 0.50 0.75 1.00
x1
16
y

Demonstration: Black is true β1
1.2
0.8
0.4
0.0
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
0.00 0.25 0.50 0.75 1.00
x1
17
y

Demonstration: Black is true β1
1.00
0.75
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00
x1
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
18
y

Demonstration: Black is true β1
1.2
0.8
0.4
0.0
0.00 0.25 0.50 0.75 1.00
x1
x2_q
1st Quartile
2nd Quartile 3rd Quartile 4th Quartile
19
y

Variability of β1
6 4 2 0
−2
1000 iterations
x1
x1_with_x2
Model
20
Beta_1

Steps 3 through 5
3. Exploratory data analysis.
• Tons of scatterplots.
• Look at correlation coefficients.
4. Transformations based on EDA.
5. Fit a rich model and look at residuals.
• Look for curvature, non-constant variance, and outliers. • Iterate the above steps until you don’t see any issues.
21

Step 6
• If appropriate, use a computer-aided technique to choose a suitable subset of explanatory variables.
• F-test if nested models • step()
22

Step 7
• Proceed with analysis with chosen explanatory variables.
8. Tell stories with the data using p-values, coefficient estimates, confidence intervals, etc. . .
23

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 STAT 614 Week 13: General Strategy for Model Building
30 $