FES 844b / STAT 660b 2003
FES758b / STAT660b
Multivariate Statistics
Homework #5MANOVA / GLM
Due : Tuesday, 4/11/17 on CANVAS (by midnight)
HOMEWORK WORKED OUT FOR SAMPLE DATASET
The example below is JUST FOR YOUR PRACTICE.
NOTHING TO TURN IN HERE!
DANIELA.csv contain data collected by Daniela Cusack from three plantations. Each
plantation was divided into areas with homogenous overstory tree species of six types.
In class we looked at factors which predicted the number of individual saplings in each
of three height classes.The saplings were also classified in terms of the dispersal
mechanism associated with that species.The three dispersal mechanisms (the
response variables)were birds, mammals, or other (includes wind, water, bats, gravity).
SAS Results
1. Look at interaction plots between plantation and overstory species for each of the
dispersal mechanisms.Discuss what you see.
Here is SAS code for this :
*MUST SORT DATA TO GET INTERACTION PLOTS!!!;
PROC SORT DATA=IN.overstory; BY treatment plantation;
PROC MEANS DATA=IN.overstory;
VAR mammals birds other;
BY treatment plantation;
OUTPUT OUT=overMEAN;
RUN;
DATA OVERMEAN; SET OVERMEAN; IF _STAT_=MEAN;
RUN;
SYMBOL1 VALUE=OC=BLACK H=2 W=5 I=JOIN;
SYMBOL2 VALUE=OC=RED H=2 W=5 I=JOIN;
SYMBOL3 VALUE=OC=GREEN H=2 W=5 I=JOIN;
PROC GPLOT DATA=overmean;
PLOT mammals*treatment=plantation;
PLOT birds*treatment=plantation;
PLOT other*treatment=plantation;
RUN;
Here are results :
These plots suggest that there may be
an interaction between plantation and
overstory species (i.e. treatment). Also
suggests that there may not be much
of a treatment effect (i.e. overstory type
doesnt affect dispersal rate).Also
seems like overall that the S plantation
has generally higher rates that the
other plantations.
2. Run MANOVA for these two categorical factors.Discuss your results, both
univariate and multivariate.
Here is Code and results for the GLM model (i.e. two-way MANOVA) :
proc glm data=in.overstory;
class treatment plantation;
model mammals birds other=plantation treatment plantation*treatment /
solution;
manova h=plantation treatment plantation*treatment;
run;
Pl a n t a t i o n P Q S
o t h e r
0
1
2
T r e a t me n t
Cb Ha T a Vf Vg Vk
Pl a n t a t i o n P Q S
b i r d s
0
1
2
3
4
5
T r e a t me n t
Cb Ha T a Vf Vg Vk
Pl a n t a t i o n P Q S
ma mma l s
0
1
2
3
T r e a t me n t
Cb Ha T a Vf Vg Vk
Results follow (lots of output).
Univariate Results : For mammals, there are differences between plantations, there
are no overall observed differences due to overstory treatment effect;however, there is
evidence on an interaction effect.
The individual coefficients suggest that for overstory species, the TA species if different
from VK (and perhaps other) species might want to test this as an indicator variable.
Similar results are observed for birds.No TA species effectthing observed for other
dispersal mechanisms.
Multivariate Results : Overall, there are differences between Plantations (all
multivariate statistics are significant).For overstory species, only Roys Greatest Root
is significant, which suggest that there is a single direction in multivariate space that
shows differences between overstory treatment groups (since Roys Greatest Root only
tests the first eigenvalue which is associated with the direction of maximum
discrimination).Most of the multivariate tests suggest there is an interaction effect
between Plantation and overstory Treatment.
Dependent Variable: mammals mammals
Sum of
SourceDF Squares Mean SquareF ValuePr > F
Model 1755.6250000 3.2720588 2.790.0021
Error 5463.2500000 1.1712963
Corrected Total 71 118.8750000
R-Square Coeff VarRoot MSEmammals Mean
0.46792883.788211.0822641.291667
SourceDF Type I SS Mean SquareF ValuePr > F
Plantation 2 18.750000009.37500000 8.000.0009
Treatment56.791666671.35833333 1.160.3410
Treatment*Plantation10 30.083333333.00833333 2.570.0128
SourceDF Type III SS Mean SquareF ValuePr > F
Plantation 2 18.750000009.37500000 8.000.0009
Treatment56.791666671.35833333 1.160.3410
Treatment*Plantation10 30.083333333.00833333 2.570.0128
Standard
Parameter Estimate Errort ValuePr > |t|
Intercept3.000000000 B0.54113221 5.54<.0001 Plantation P-2.000000000 B0.76527652-2.610.0116 Plantation Q-1.500000000 B0.76527652-1.960.0552 Plantation S 0.000000000 B .. . TreatmentCb -0.750000000 B0.76527652-0.980.3314 TreatmentHa -1.250000000 B0.76527652-1.630.1082 TreatmentTa -2.500000000 B0.76527652-3.270.0019 TreatmentVf -1.500000000 B0.76527652-1.960.0552 TreatmentVg -0.500000000 B0.76527652-0.650.5163 TreatmentVk0.000000000 B .. . Treatment*Plantation Cb P0.000000000 B1.08226443 0.001.0000 Treatment*Plantation Cb Q1.250000000 B1.08226443 1.150.2532 Treatment*Plantation Cb S0.000000000 B .. . Treatment*Plantation Ha P2.250000000 B1.08226443 2.080.0424 Treatment*Plantation Ha Q0.000000000 B1.08226443 0.001.0000 Treatment*Plantation Ha S0.000000000 B .. . Treatment*Plantation Ta P2.000000000 B1.08226443 1.850.0701Dependent Variable: mammals mammalsStandard Parameter Estimate Errort ValuePr > |t|
Treatment*Plantation Ta Q2.750000000 B1.08226443 2.540.0140
Treatment*Plantation Ta S0.000000000 B .. .
Treatment*Plantation Vf P0.500000000 B1.08226443 0.460.6459
Treatment*Plantation Vf Q1.750000000 B1.08226443 1.620.1117
Treatment*Plantation Vf S0.000000000 B .. .
Treatment*Plantation Vg P -0.250000000 B1.08226443-0.230.8182
Treatment*Plantation Vg Q -0.500000000 B1.08226443-0.460.6459
Treatment*Plantation Vg S0.000000000 B .. .
Treatment*Plantation Vk P0.000000000 B .. .
Treatment*Plantation Vk Q0.000000000 B .. .
Treatment*Plantation Vk S0.000000000 B .. .
NOTE: The XX matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations.Terms whose estimates are followed by the letter B are not
uniquely estimable.
Dependent Variable: birds birds
Sum of
SourceDF Squares Mean SquareF ValuePr > F
Model 17 124.5694444 7.3276144 3.070.0009
Error 54 128.7500000 2.3842593
Corrected Total 71 253.3194444
R-Square Coeff VarRoot MSEbirds Mean
0.49174871.726151.5441052.152778
SourceDF Type I SS Mean SquareF ValuePr > F
Plantation 2 42.19444444 21.09722222 8.850.0005
Treatment58.736111111.74722222 0.730.6020
Treatment*Plantation10 73.638888897.36388889 3.090.0036
SourceDF Type III SS Mean SquareF ValuePr > F
Plantation 2 42.19444444 21.09722222 8.850.0005
Treatment58.736111111.74722222 0.730.6020
Treatment*Plantation10 73.638888897.36388889 3.090.0036
Standard
Parameter Estimate Errort ValuePr > |t|
Intercept4.000000000 B0.77205234 5.18<.0001 Plantation P-2.750000000 B1.09184689-2.520.0148 Plantation Q-2.750000000 B1.09184689-2.520.0148 Plantation S 0.000000000 B .. . TreatmentCb0.250000000 B1.09184689 0.230.8198 TreatmentHa -1.750000000 B1.09184689-1.600.1148 TreatmentTa -2.500000000 B1.09184689-2.290.0260 TreatmentVf -0.500000000 B1.09184689-0.460.6488 TreatmentVg -0.250000000 B1.09184689-0.230.8198 TreatmentVk0.000000000 B .. . Treatment*Plantation Cb P -0.750000000 B1.54410468-0.490.6291 Treatment*Plantation Cb Q2.000000000 B1.54410468 1.300.2007 Treatment*Plantation Cb S0.000000000 B .. . Treatment*Plantation Ha P3.250000000 B1.54410468 2.100.0400 Treatment*Plantation Ha Q0.750000000 B1.54410468 0.490.6291 Treatment*Plantation Ha S0.000000000 B .. . Treatment*Plantation Ta P3.750000000 B1.54410468 2.430.0185Dependent Variable: birds birdsStandard Parameter Estimate Errort ValuePr > |t|
Treatment*Plantation Ta Q3.750000000 B1.54410468 2.430.0185
Treatment*Plantation Ta S0.000000000 B .. .
Treatment*Plantation Vf P -0.750000000 B1.54410468-0.490.6291
Treatment*Plantation Vf Q2.250000000 B1.54410468 1.460.1509
Treatment*Plantation Vf S0.000000000 B .. .
Treatment*Plantation Vg P0.250000000 B1.54410468 0.160.8720
Treatment*Plantation Vg Q -0.500000000 B1.54410468-0.320.7473
Treatment*Plantation Vg S0.000000000 B .. .
Treatment*Plantation Vk P0.000000000 B .. .
Treatment*Plantation Vk Q0.000000000 B .. .
Treatment*Plantation Vk S0.000000000 B .. .
NOTE: The XX matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations.Terms whose estimates are followed by the letter B are not
uniquely estimable.
Dependent Variable: other
Sum of
SourceDF Squares Mean SquareF ValuePr > F
Model 17 26.125000001.53676471 2.460.0063
Error 54 33.750000000.62500000
Corrected Total 71 59.87500000
R-Square Coeff VarRoot MSEother Mean
0.43632699.861400.7905690.791667
SourceDF Type I SS Mean SquareF ValuePr > F
Plantation 28.583333334.29166667 6.870.0022
Treatment54.791666670.95833333 1.530.1950
Treatment*Plantation10 12.750000001.27500000 2.040.0466
SourceDF Type III SS Mean SquareF ValuePr > F
Plantation 28.583333334.29166667 6.870.0022
Treatment54.791666670.95833333 1.530.1950
Treatment*Plantation10 12.750000001.27500000 2.040.0466
Standard
Parameter Estimate Errort ValuePr > |t|
Intercept1.000000000 B0.39528471 2.530.0144
Plantation P-1.000000000 B0.55901699-1.790.0792
Plantation Q-0.750000000 B0.55901699-1.340.1853
Plantation S 0.000000000 B .. .
TreatmentCb1.000000000 B0.55901699 1.790.0792
TreatmentHa -0.250000000 B0.55901699-0.450.6565
TreatmentTa -0.500000000 B0.55901699-0.890.3751
TreatmentVf0.250000000 B0.55901699 0.450.6565
TreatmentVg1.000000000 B0.55901699 1.790.0792
TreatmentVk0.000000000 B .. .
Treatment*Plantation Cb P -0.250000000 B0.79056942-0.320.7530
Treatment*Plantation Cb Q -0.500000000 B0.79056942-0.630.5298
Treatment*Plantation Cb S0.000000000 B .. .
Treatment*Plantation Ha P1.000000000 B0.79056942 1.260.2113
Treatment*Plantation Ha Q0.000000000 B0.79056942 0.001.0000
Treatment*Plantation Ha S0.000000000 B .. .
Treatment*Plantation Ta P1.250000000 B0.79056942 1.580.1197
Dependent Variable: other
Standard
Parameter Estimate Errort ValuePr > |t|
Treatment*Plantation Ta Q1.500000000 B0.79056942 1.900.0631
Treatment*Plantation Ta S0.000000000 B .. .
Treatment*Plantation Vf P -0.250000000 B0.79056942-0.320.7530
Treatment*Plantation Vf Q1.000000000 B0.79056942 1.260.2113
Treatment*Plantation Vf S0.000000000 B .. .
Treatment*Plantation Vg P -0.750000000 B0.79056942-0.950.3470
Treatment*Plantation Vg Q -0.750000000 B0.79056942-0.950.3470
Treatment*Plantation Vg S0.000000000 B .. .
Treatment*Plantation Vk P0.000000000 B .. .
Treatment*Plantation Vk Q0.000000000 B .. .
Treatment*Plantation Vk S0.000000000 B .. .
NOTE: The XX matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations.Terms whose estimates are followed by the letter B are not
uniquely estimable.
The GLM Procedure
Multivariate Analysis of Variance
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Plantation
E = Error SSCP Matrix
Characteristic Characteristic VectorVEV=1
RootPercent mammals birds other
0.6590819593.980.025452180.056072670.11657032
0.04220450 6.02 -0.164624720.100878710.01551757
0.00000000 0.00 -0.03795726 -0.033823300.12965599
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Plantation Effect
H = Type III SSCP Matrix for Plantation
E = Error SSCP Matrix
S=2M=0N=25
StatisticValueF ValueNum DFDen DFPr > F
Wilks Lambda 0.57833466 5.46 6 104<.0001 Pillai’s Trace0.43775243 4.95 6 1060.0002 Hotelling-Lawley Trace0.70128645 6.02 667.584<.0001 Roy’s Greatest Root 0.6590819511.64 353<.0001NOTE: F Statistic for Roy’s Greatest Root is an upper bound.NOTE: F Statistic for Wilks’ Lambda is exact. Characteristic Roots and Vectors of: E Inverse * H, whereH = Type III SSCP Matrix for TreatmentE = Error SSCP Matrix Characteristic Characteristic VectorV’EV=1RootPercent mammals birds other 0.3714076580.86 -0.129064450.087200920.126164860.0743141516.180.090566860.027578950.049529360.01360850 2.96 -0.065801820.07810298 -0.11076723 MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Treatment Effect H = Type III SSCP Matrix for TreatmentE = Error SSCP Matrix S=3M=0.5N=25StatisticValueF ValueNum DFDen DFPr > F
Wilks Lambda 0.66962536 1.5015143.950.1122
Pillais Trace0.35342158 1.4415 1620.1335
Hotelling-Lawley Trace0.45933030 1.561593.1130.0991
Roys Greatest Root 0.37140765 4.01 5540.0036
NOTE: F Statistic for Roys Greatest Root is an upper bound.
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Treatment*Plantation
E = Error SSCP Matrix
Characteristic Characteristic VectorVEV=1
RootPercent mammals birds other
0.9643015783.050.022606700.060533080.11038347
0.1238804010.67 -0.145140930.10390370 -0.03122217
0.07292461 6.28 -0.087252020.002107600.13221488
MANOVA Test Criteria and F Approximations for the Hypothesis
of No Overall Treatment*Plantation Effect
H = Type III SSCP Matrix for Treatment*Plantation
E = Error SSCP Matrix
S=3M=3N=25
StatisticValueF ValueNum DFDen DFPr > F
Wilks Lambda 0.42218474 1.7530153.310.0158
Pillais Trace0.66910687 1.5530 1620.0449
Hotelling-Lawley Trace1.16110658 1.9730111.020.0059
Roys Greatest Root 0.96430157 5.211054<.0001NOTE: F Statistic for Roy’s Greatest Root is an upper bound. 3. Perform (multivariate) contrasts to compare levels of a particular factor or combinations of factors.Discuss your results.To do this, we have to analyze this as a ONE WAY MANOVA with a combination treatment effect.The code below creates a new variable called trtcombine which just combines plantation code and treatment code.There are LOTS of possibilities here.I contrasted Plantations P and Q, and P and S.I contrasted overstory species Vf vs. Vg.Finally, I looked at the interaction between Vf and Vg between plots P and Q.data in.overstory; set in.overstory; trtcombine=trim(trim(plantation) || trim(treatment)); run;To run the actual model with contrasts useproc sort data=in.overstory; by trtcombine; proc glm data=in.overstory; class trtcombine; model mammals birds other=trtcombine; contrast ‘P vs Q’ trtcombine 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0; contrast ‘P vs S’ trtcombine 1 1 1 1 1 1 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1; contrast ‘Vf vs Vg’ trtcombine 0 0 0 1 0 -1 0 0 0 0 1 0 -1 0 0 0 0 1 0 -1 0; contrast ‘Vf vs Vg PQ interaction’ trtcombine 0 0 0 1 0 -1 0 0 0 0 -1 0 1 0 0 0 0 0 0 0 0; manova h=trtcombine; run;Lots of output, I leave interpretation to you. 4. Daniela also measured the amount of light and the density of forest litter on each plot.Fit a model that includes there covariates as predictors of the number of saplings associated with each dispersal mechanism. Use PROC PLOT to make plots to check for linearity.For Danielas data, we add the possible covariate effects of light and litter.The SAS code isPROC GLM DATA=IN.OVERSTORY; CLASS TREATMENT PLANTATION; MODEL MAMMALS BIRDS OTHER=TREATMENT PLANTATION TREATMENT*PLANTATION LIGHT LITTER / SOLUTION; MANOVA H=TREATMENT PLANTATION TREATMENT*PLANTATION LIGHT LITTER; RUN; The output shows that neither appears to have a significant univariate or multivariate effect on dispersion mechanism counts.MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Light Effect H = Type III SSCP Matrix for LightE = Error SSCP Matrix S=1M=0.5N=22StatisticValueF ValueNum DFDen DFPr > F
Wilks Lambda 0.91994901 1.33 3460.2748
Pillais Trace0.08005099 1.33 3460.2748
Hotelling-Lawley Trace0.08701677 1.33 3460.2748
Roys Greatest Root 0.08701677 1.33 3460.2748
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Litter Effect
H = Type III SSCP Matrix for Litter
E = Error SSCP Matrix
S=1M=0.5N=22
StatisticValueF ValueNum DFDen DFPr > F
Wilks Lambda 0.99465782 0.08 3460.9693
Pillais Trace0.00534218 0.08 3460.9693
Hotelling-Lawley Trace0.00537088 0.08 3460.9693
Roys Greatest Root 0.00537088 0.08 3460.9693
5. Check model assumptions by making a chi-square quantile plot of the residuals.
Modify your model as appropriate based on your findings.
Ive already discussed how to make chi-square quantile plots.You just need the
residuals. The SAS code simply adds an OUTPUT option to PROC GLM
PROC GLM DATA=IN.OVERSTORY ;
CLASS TREATMENT PLANTATION;
MODEL MAMMALS BIRDS OTHER=TREATMENT PLANTATION TREATMENT*PLANTATION;
MANOVA H=TREATMENT PLANTATION TREATMENT*PLANTATION;
OUTPUT OUT=OUTSTAT RESIDUAL=RESIDUALA RESIDUALB RESIDUALC;
RUN;
%INCLUDE C:Documents and SettingsjonMy DocumentsClassesMultivariate
Articles Programs ResourcesSoftware ProgramsSAS programsMULTNORM.SAS;
%MULTNORM(VAR= RESIDUALA RESIDUALB RESIDUALC, DATA=OUTSTAT)
The resulting plot looks good no evidence of serious departure from multivariate
normality.
SPSS Results
1. Look at interaction plots between plantation
and overstory species for each of the
dispersal mechanisms.Discuss what you
see.
To make interaction plots in SPSS (called Profile
Plots), use Analyze Generalized Linear Model
Multivariate.This will give you the plots and the model
needed for question two.Indicate that mammals, birds
and other the Dependent Variables and that Plantation
and Treatment are Fixed Factors.
Click on Plots, indicate that youd like Treatment on
the Horizontal Axis and Plantation as Separate Lines,
then click ADD.
See SAS section for interpretation of results and
similar plots.
2. Run MANOVA for these two categorical factors.Discuss your results, both
univariate and multivariate.
3. Perform (multivariate) contrasts to compare levels of a particular factor or
combinations of factors.Discuss your results.
To make new
combination
treatment variable,
can do manually in
EXCEL, or in SPSS
use TRANSFORM
COMPUTE and
use the
Concatenate
function : Make
sure you click here
and indicate that
the data type is
STRING
To run the
contrasts, click on the CONTRASTS button in SPSS when running Analyze GLM.
However, this only gives UNIVARIATE contrasts.No multivariate contrasts available
that I know of sorry!
4. Daniela also measured the amount of light and the density of forest litter on each
plot.Fit a model that includes there covariates as predictors of the number of
saplings associated with each dispersal mechanism.
In SPSS, just enter LIGHT and LITTER in the COVARIATESbox in Analyze General
Lineral Models.Results are discussed in SAS section.
5. Check model assumptions by making a chi-square quantile plot of the residuals.
Modify your model as appropriate based on your findings.
Ive already discussed how to make chi-square quantile plots in
SPSS (see notes at the very end of Principle Components
Analysis).
In SPSS, when using Analyze General Liner Models
Multivariate, click on SAVE and then choose
UNSTANDARDIZED RESIDUALS
R Results(code only for results see SAS section)
1. Look at interaction plots between plantation and overstory species for each of the
dispersal mechanisms.Discuss what you see.
Here is the R code :See SAS section above for results and interpretation.
#get the data
daniela=read.csv(http://www.reuningscherer.net/stat660/data/Daniela.c
sv, header=T)
#make an interaction plots
#this statement makes 4 plots per page
par(mfrow=c(2,2))
#this makes the plots
interaction.plot(daniela$Treatment,daniela$Plantation,daniela$mammals,
lwd=3,col=c(red,blue,black),xlab=Species,main=Interaction
Plot for Mammals)
interaction.plot(daniela$Treatment,daniela$Plantation,daniela$birds,
lwd=3,col=c(red,blue,black),xlab=Species,main=Interaction
Plot for Birds)
interaction.plot(daniela$Treatment,daniela$Plantation,daniela$other,
lwd=3,col=c(red,blue,black),xlab=Species,main=Interaction
Plot for Other)
2. Run MANOVA for these two categorical factors.Discuss your results, both
univariate and multivariate.
#fit linear model
mod1=manova(as.matrix(daniela[,8:10])~daniela$Treatment +
daniela$Plantation +daniela$Plantation*daniela$Treatment)
#get univariate results
summary.aov(mod1)
#get multivariate results
summary.manova(mod1)
summary.manova(mod1,test=Wilks)
3. Perform (multivariate) contrasts to compare levels of a particular factor or
combinations of factors.Discuss your results.
Ill simply say here that contrasts in R are difficult and at present not pleasant.You can
see the comments Ive made in the notes about contrasts bu
Reviews
There are no reviews yet.