F-test in Multiple Regression
Comparing Nested Models
Richard Ressler 2019-11-20
1
Learning Outcomes
Develop and Apply Tests for including multiple variables at the same time.
Compare Nested Models using the F-Tests and anova() in R References: Section 10.3 in the book
2
Case Study: Kentucky Derby
Speed
vs Year and Year2.
## Year
## 1 1896
## 2 1897
## 3 1898
## 4 1899
## 5 1900 Lieut. Gibson7
## 6 1901His Eminence5
Winner Starters NetToWinner Time Speed T
Ben Brush8
Typhoon II6
Plaudit4
Manuel5
4850 127.75 35.23 D
4850 132.50 33.96 H
4850 129.00 34.88
4850 132.00 34.09
4850 126.25 35.64
4850 127.75 35.23
3
library(Sleuth3) data(ex0920) head(ex0920)
Year vs Speed
qplot(Year, Speed, data = ex0920) + geom_smooth(se = FALSE)
37
36
35
34
1920 1950 1980 2010
Year
4
Speed
Goal
Get a p-value for the association between year and speed.
A linear model looks okay but a model with a quadratic term
might be better.
(Speed|Year) = 0 + 1Year + 2Year2
So to see if Year2 is important, we need to test:
H0 :2 =0given1 =0 HA :2 =0given1 =0
5
Full and Reduced Models:
Reduced Model: (Speed|Year) = 0 + 1Year
Full Model: (Speed|Year) = 0 + 1Year + 2Year2 Use F-test strategy to run this hypothesis test.
1. Fit both full and reduced models.
2. Calculate sum of squared residuals under both models and the
corresponding degrees of freedom.
3. Calculate the F-statistic.
4. Compare to theoretical F-distribution under H0
6
Fit Under Reduced Simple Linear Regression
37
36
35
34
1920 1950 1980 2010
Year
7
Speed
Residuals under Reduced
37
36
35
34
1920 1950 1980 2010
Year
8
Speed
Residuals against Fit for Reduced
1
0
1
2
35.0 35.5 36.0 36.5 37.0
fitted(lmreduced)
9
resid(lmreduced)
Fit under Full
37
36
35
34
1920 1950 1980 2010
Year
10
Speed
Residuals under Full
37
36
35
34
1920 1950 1980 2010
Year
11
Speed
Residuals against Fit for Full
1
0
1
34.5 35.0 35.5 36.0 36.5 37.
fitted(lmfull)
0
12
resid(lmfull)
Running the F Test to compare the models in R
First, fit both reduced and full models.
Save the output to two different variables.
ex0920$Year2 <- ex0920$Year ^ 2lmfull <- lm(Speed ~ Year + Year2, data = ex0920) lmreduced <- lm(Speed ~ Year, data = ex0920) Run anova() with the reduced model as the first argument. anova(lmreduced, lmfull)## Analysis of Variance Table#### Model 1: Speed ~ Year## Model 2: Speed ~ Year + Year2 #### 1## 2## —Res.DfRSS Df Sum of SqF 114 41.837Pr(>F)
113 33.08318.7539 29.9 2.757e-07 ***
13
What is that Table?
## Analysis of Variance Table
##
## Model 1: Speed ~ Year
## Model 2: Speed ~ Year + Year2
Pr(>F)
113 33.08318.7539 29.9 2.757e-07 ***
##
## 1
## 2
##
## Signif. codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1
Res.DfRSS Df Sum of SqF
114 41.837
Res.Df RSS Df Sum of Sq F Pr(>F)
dfreduced RSSreduced
dffull RSSfull dfextra ESS F -stat p-value
14
F -test
We can use the F-test for any two nested models.
Nested: The reduced model is a special case of the full model created by setting constraints on some of the parameters of the full.
e.g., set one or more parameters to zero.
15
Another Example: Starters Variable Marginal Fit
37
36
35
34
5 10 15 20
Starters
16
Speed
Consider adding variables for Starters and Starters2
(Speed|Year,Starters) =
0 + 1Year + 2Year2 + 3Starters + 4Starters2
H0 : 3 = 4 = 0
HA : either3 =0or4 =0
Full Model: (Speed|Year,Starters) =
0 + 1Year + 2Year2 + 3Starters + 4Starters2
Reduced Model:
(Speed|Year, Starters) = 0 + 1Year + 2Year2
17
Fit and Run F Test in R
ex0920$Starters2 <- ex0920$Starters ^ 2lmfull <- lm(Speed ~ Year + Year2 + Starters +Starters2, data = ex0920)lmreduced <- lm(Speed ~ Year + Year2, data = ex0920)anova(lmreduced, lmfull) ## Analysis of Variance Table#### Model 1: Speed ~ Year + Year2## Model 2: Speed ~ Year + Year2 + Starters + Starters2## Res.DfRSS Df Sum of SqFPr(>F)
## 1113 33.083
## 2111 30.90322.1803 3.9156 0.02274 *
##
## Signif. codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1
18
Example of a non-nested model
Model1: (Speed|Year,Starters)=0+1Year+2Year2
Model 2:
(Speed|Year,Starters)=0 +1Starters+2Starters2
Cannot use an F-test to compare these two models.
Why? Mathematical theory only guarantees the F-distribution when the models are nested.
When models are not nested, use other methods to evaluate
e.g., adjusted R2, Cp, AIC, or BIC methods from section 12.4 (more on this later).
19
Reviews
There are no reviews yet.