So, I'm trying to compare two models, fit1 and fit2.
Initially, I was just doing anova(fit1,fit2), and this yielded output that I understood (including a p-value).
However, when I switched my models from lm()-based models to glm()-based models, anova(fit1,fit2) now yielded Residual Degrees of Freedom, Residuals Deviances, and Df Deviances, which I am having trouble interpreting (resources explaining these metrics seem scarce). I was hoping to extract a p-value for the comparison between the two models, but for some reason anova(fit1,fit2, test='Chisq') isn't working. Any suggestions?
I realize that, depending on the link function in my glms, Chi-squared may not be the most appropriate test, but I have used 'F' in appropriate contexts as well with similar disappointment.
Is this problem familiar to anybody else? Suggestions? Many thanks!
Example:
make_and_compare_models <- function(fitness_trait_name, data_frame_name, vector_for_multiple_regression, predictor_for_single_regression, fam){
fit1<-glm(formula=as.formula(paste(fitness_trait_name,"~", paste(vector_for_multiple_regression, sep="+"))), family=fam, data=data_frame_name)
print ("summary fit 1")
print(summary(fit1))
fit2<- glm(data=data_frame_name, formula=as.formula(paste(fitness_trait_name,"~",predictor_for_single_regression)), family=fam)
print("summary fit 2")
print(summary(fit2))
print("model comparison stats:")
mod_test<-anova(fit2,fit1)
##suggestion #1
print(anova(fit2,fit1, test="Chisq"))
#suggestion #2
print ("significance:")
print (1-pchisq( abs(mod_test$Deviance[2]),df=abs(mod_test$Df[2])))
}
data<-structure(list(ID = c(1L, 2L, 4L, 7L, 9L, 10L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 29L,
31L, 34L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 45L, 46L, 47L, 48L,
49L, 52L, 55L, 56L, 59L, 60L, 61L, 62L, 63L, 65L, 66L, 67L, 68L,
69L, 71L), QnWeight_initial = c(158L, 165L, 137L, 150L, 153L,
137L, 158L, 163L, 159L, 151L, 145L, 144L, 157L, 144L, 133L, 148L,
151L, 151L, 147L, 158L, 178L, 164L, 134L, 151L, 148L, 142L, 127L,
179L, 162L, 150L, 151L, 153L, 163L, 155L, 163L, 170L, 149L, 165L,
128L, 134L, 145L, 147L, 148L, 160L, 131L, 155L, 169L, 143L, 123L,
151L), Survived_eclosion = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Days_wrkr_eclosion_minus20 = c(NA,
1L, NA, 3L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, NA, 0L, 7L, 1L, 0L,
1L, 0L, 1L, 2L, 2L, NA, 2L, 3L, 2L, 2L, NA, 0L, 1L, NA, NA, 0L,
0L, 0L, 0L, 3L, 3L, 3L, 1L, 0L, 2L, NA, 1L, 0L, 1L, 1L, 3L, 1L,
2L), MLH = c(0.5, 0.666666667, 0.555555556, 0.25, 1, 0.5, 0.333333333,
0.7, 0.5, 0.7, 0.5, 0.666666667, 0.375, 0.4, 0.5, 0.333333333,
0.4, 0.375, 0.3, 0.5, 0.3, 0.2, 0.4, 0.875, 0.6, 0.4, 0.222222222,
0.222222222, 0.6, 0.6, 0.3, 0.4, 0.714285714, 0.4, 0.3, 0.6,
0.4, 0.7, 0.625, 0.555555556, 0.25, 0.5, 0.5, 0.6, 0.25, 0.428571429,
0.3, 0.25, 0.375, 0.555555556), Acon5 = c(0.35387674, 0.35387674,
0.35387674, 0.35387674, 0.35387674, 0.35387674, 0.35387674, 0,
0, 1, 0, 1, 0.35387674, 0, 0, 0.35387674, 1, 1, 0, 0, 0, 1, 0,
0.35387674, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0.35387674), Baez = c(1, 1, 1, 0.467836257,
1, 1, 0, 0, 1, 1, 0, 0.467836257, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0.467836257, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), C294 = c(0, 1, 0, 0, 1,
0.582542694, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,
0, 1, 1, 0, 0, 0.582542694, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1), C316 = c(1, 1, 0, 0, 0.519685039,
0.519685039, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0.519685039, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0.519685039, 1, 0, 1,
1, 0, 0.519685039, 1, 0.519685039, 1, 1, 1, 0.519685039, 0.519685039,
0, 0.519685039, 0.519685039, 0), i_120_PigTail = c(1, 1, 0, 1,
0.631236443, 0.631236443, 1, 1, 1, 1, 1, 0, 0.631236443, 1, 1,
1, 0, 0.631236443, 1, 1, 1, 0, 0, 1, 1, 1, 0.631236443, 0, 1,
1, 0, 1, 0.631236443, 1, 0, 1, 0, 0, 1, 0.631236443, 0.631236443,
0, 1, 0, 0.631236443, 0.631236443, 1, 0.631236443, 0.631236443,
1), i129 = c(0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Jackstraw_PigTail = c(0L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Neil_Young = c(0.529636711,
0, 1, 0, 0.529636711, 0.529636711, 1, 1, 0, 1, 1, 1, 0, 0, 1,
1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0,
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1), Ramble = c(0, 0, 0,
0, 0.215163934, 0.215163934, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0.215163934, 0,
0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0.215163934, 0, 0, 0, 0), Sol_18 = c(1,
0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0.404669261,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1)), .Names = c("ID", "QnWeight_initial",
"Survived_eclosion", "Days_wrkr_eclosion_minus20", "MLH", "Acon5",
"Baez", "C294", "C316", "i_120_PigTail", "i129", "Jackstraw_PigTail",
"Neil_Young", "Ramble", "Sol_18"), class = "data.frame", row.names = c(NA,
-50L))
make_and_compare_models("QnWeight_initial", data, c("Acon5","Baez","C294","C316","i_120_PigTail","i129","Jackstraw_PigTail","Neil_Young","Ramble","Sol_18"), "MLH", "gaussian")
The difference in deviance between a "larger" or more complex model and a nested or "reduced" model is distributed (asymptotically) as a chi-squared variate with the difference in degrees of freedom of the two models. So you would extract the deviance estimate and the difference in degrees of freedom and compare that to pchisq( deviance, diff(df) ). The "p-value" is just 1 minus that value.
> 1-pchisq(3.84,1)
[1] 0.05004352
If you run the first example in the glm help page and then add a reduced model without the "treatment" variable, you get:
glm.D93.o <- glm(counts ~ outcome, family=poisson())
anova.res <-anova(glm.D93, glm.D93.o)
anova.res
#------------
Analysis of Deviance Table
Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome
Resid. Df Resid. Dev Df Deviance
1 4 5.1291
2 6 5.1291 -2 -2.6645e-15
#---------------
str(anova.res)
Classes ‘anova’ and 'data.frame': 2 obs. of 4 variables:
$ Resid. Df : num 4 6
$ Resid. Dev: num 5.13 5.13
$ Df : num NA -2
$ Deviance : num NA -2.66e-15
- attr(*, "heading")= chr "Analysis of Deviance Table\n" "Model 1: counts ~ outcome + treatment\nModel 2: counts ~ outcome"
So after looking at how things were stored in the object itself, this give the p-value for "outcome":
1-pchisq( abs(anova.res$Deviance[2]), abs(anova.res$Df[2]))
[1] 1
And this would be the corresponding procedure on the treatment+outcome model versus the treatment-only model:
> glm.D93.t <- glm(counts ~ treatment, family=poisson())
> anova.res2 <-anova(glm.D93, glm.D93.t)
> 1-pchisq( abs(anova.res2$Deviance[2]), abs(anova.res2$Df[2]))
[1] 0.06547071
If your 2 models are nested, then you can use the change in deviance of the 2 models to see if the model containing extra parameters yields an improved fit. If model 1 contains k
parameters and model 2 contains those same k
parameters plus an additional m
parameters, then the change in deviance follows an (approximately) chi-square distribution with m
degrees of freedom. You can use this test statistic to see if model 2 is an improvement on model 1.
If you are new to this area, I would strongly recommend reading an introductory text on GLMs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With