How to extract a p-value when performing anova() between two glm models in R

Question

So, I'm trying to compare two models, fit1 and fit2.

Initially, I was just doing anova(fit1,fit2), and this yielded output that I understood (including a p-value).

However, when I switched my models from lm()-based models to glm()-based models, anova(fit1,fit2) now yielded Residual Degrees of Freedom, Residuals Deviances, and Df Deviances, which I am having trouble interpreting (resources explaining these metrics seem scarce). I was hoping to extract a p-value for the comparison between the two models, but for some reason anova(fit1,fit2, test='Chisq') isn't working. Any suggestions?

I realize that, depending on the link function in my glms, Chi-squared may not be the most appropriate test, but I have used 'F' in appropriate contexts as well with similar disappointment.

Is this problem familiar to anybody else? Suggestions? Many thanks!

Example:

make_and_compare_models <- function(fitness_trait_name, data_frame_name, vector_for_multiple_regression, predictor_for_single_regression, fam){
        fit1<-glm(formula=as.formula(paste(fitness_trait_name,"~", paste(vector_for_multiple_regression, sep="+"))), family=fam, data=data_frame_name)
        print ("summary fit 1")
        print(summary(fit1))
        fit2<- glm(data=data_frame_name, formula=as.formula(paste(fitness_trait_name,"~",predictor_for_single_regression)), family=fam)

        print("summary fit 2")
        print(summary(fit2))
        print("model comparison stats:")
        mod_test<-anova(fit2,fit1)

        ##suggestion #1
        print(anova(fit2,fit1, test="Chisq"))

        #suggestion #2
        print ("significance:")
    print (1-pchisq( abs(mod_test$Deviance[2]),df=abs(mod_test$Df[2])))

        }


data<-structure(list(ID = c(1L, 2L, 4L, 7L, 9L, 10L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 29L, 
31L, 34L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 45L, 46L, 47L, 48L, 
49L, 52L, 55L, 56L, 59L, 60L, 61L, 62L, 63L, 65L, 66L, 67L, 68L, 
69L, 71L), QnWeight_initial = c(158L, 165L, 137L, 150L, 153L, 
137L, 158L, 163L, 159L, 151L, 145L, 144L, 157L, 144L, 133L, 148L, 
151L, 151L, 147L, 158L, 178L, 164L, 134L, 151L, 148L, 142L, 127L, 
179L, 162L, 150L, 151L, 153L, 163L, 155L, 163L, 170L, 149L, 165L, 
128L, 134L, 145L, 147L, 148L, 160L, 131L, 155L, 169L, 143L, 123L, 
151L), Survived_eclosion = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Days_wrkr_eclosion_minus20 = c(NA, 
1L, NA, 3L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, NA, 0L, 7L, 1L, 0L, 
1L, 0L, 1L, 2L, 2L, NA, 2L, 3L, 2L, 2L, NA, 0L, 1L, NA, NA, 0L, 
0L, 0L, 0L, 3L, 3L, 3L, 1L, 0L, 2L, NA, 1L, 0L, 1L, 1L, 3L, 1L, 
2L), MLH = c(0.5, 0.666666667, 0.555555556, 0.25, 1, 0.5, 0.333333333, 
0.7, 0.5, 0.7, 0.5, 0.666666667, 0.375, 0.4, 0.5, 0.333333333, 
0.4, 0.375, 0.3, 0.5, 0.3, 0.2, 0.4, 0.875, 0.6, 0.4, 0.222222222, 
0.222222222, 0.6, 0.6, 0.3, 0.4, 0.714285714, 0.4, 0.3, 0.6, 
0.4, 0.7, 0.625, 0.555555556, 0.25, 0.5, 0.5, 0.6, 0.25, 0.428571429, 
0.3, 0.25, 0.375, 0.555555556), Acon5 = c(0.35387674, 0.35387674, 
0.35387674, 0.35387674, 0.35387674, 0.35387674, 0.35387674, 0, 
0, 1, 0, 1, 0.35387674, 0, 0, 0.35387674, 1, 1, 0, 0, 0, 1, 0, 
0.35387674, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 
0, 0, 1, 0, 0, 0, 1, 0, 0.35387674), Baez = c(1, 1, 1, 0.467836257, 
1, 1, 0, 0, 1, 1, 0, 0.467836257, 1, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0.467836257, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), C294 = c(0, 1, 0, 0, 1, 
0.582542694, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 
0, 1, 1, 0, 0, 0.582542694, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1), C316 = c(1, 1, 0, 0, 0.519685039, 
0.519685039, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0.519685039, 0, 
1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0.519685039, 1, 0, 1, 
1, 0, 0.519685039, 1, 0.519685039, 1, 1, 1, 0.519685039, 0.519685039, 
0, 0.519685039, 0.519685039, 0), i_120_PigTail = c(1, 1, 0, 1, 
0.631236443, 0.631236443, 1, 1, 1, 1, 1, 0, 0.631236443, 1, 1, 
1, 0, 0.631236443, 1, 1, 1, 0, 0, 1, 1, 1, 0.631236443, 0, 1, 
1, 0, 1, 0.631236443, 1, 0, 1, 0, 0, 1, 0.631236443, 0.631236443, 
0, 1, 0, 0.631236443, 0.631236443, 1, 0.631236443, 0.631236443, 
1), i129 = c(0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), Jackstraw_PigTail = c(0L, 1L, 1L, 0L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 
0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Neil_Young = c(0.529636711, 
0, 1, 0, 0.529636711, 0.529636711, 1, 1, 0, 1, 1, 1, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1), Ramble = c(0, 0, 0, 
0, 0.215163934, 0.215163934, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0.215163934, 0, 
0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0.215163934, 0, 0, 0, 0), Sol_18 = c(1, 
0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0.404669261, 
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1)), .Names = c("ID", "QnWeight_initial", 
"Survived_eclosion", "Days_wrkr_eclosion_minus20", "MLH", "Acon5", 
"Baez", "C294", "C316", "i_120_PigTail", "i129", "Jackstraw_PigTail", 
"Neil_Young", "Ramble", "Sol_18"), class = "data.frame", row.names = c(NA, 
-50L))

make_and_compare_models("QnWeight_initial", data, c("Acon5","Baez","C294","C316","i_120_PigTail","i129","Jackstraw_PigTail","Neil_Young","Ramble","Sol_18"), "MLH", "gaussian")

IRTFM · Accepted Answer

The difference in deviance between a "larger" or more complex model and a nested or "reduced" model is distributed (asymptotically) as a chi-squared variate with the difference in degrees of freedom of the two models. So you would extract the deviance estimate and the difference in degrees of freedom and compare that to pchisq( deviance, diff(df) ). The "p-value" is just 1 minus that value.

> 1-pchisq(3.84,1)
[1] 0.05004352

If you run the first example in the glm help page and then add a reduced model without the "treatment" variable, you get:

glm.D93.o <- glm(counts ~ outcome, family=poisson())
 anova.res <-anova(glm.D93, glm.D93.o)
 anova.res
#------------
Analysis of Deviance Table

Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome
  Resid. Df Resid. Dev Df    Deviance
1         4     5.1291               
2         6     5.1291 -2 -2.6645e-15
#---------------
 str(anova.res)
Classes ‘anova’ and 'data.frame':   2 obs. of  4 variables:
 $ Resid. Df : num  4 6
 $ Resid. Dev: num  5.13 5.13
 $ Df        : num  NA -2
 $ Deviance  : num  NA -2.66e-15
 - attr(*, "heading")= chr  "Analysis of Deviance Table
" "Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome"

So after looking at how things were stored in the object itself, this give the p-value for "outcome":

 1-pchisq( abs(anova.res$Deviance[2]), abs(anova.res$Df[2]))
[1] 1

And this would be the corresponding procedure on the treatment+outcome model versus the treatment-only model:

> glm.D93.t <- glm(counts ~ treatment, family=poisson())
> anova.res2 <-anova(glm.D93, glm.D93.t)
> 1-pchisq( abs(anova.res2$Deviance[2]), abs(anova.res2$Df[2]))
[1] 0.06547071

mathematician1975 · Answer

If your 2 models are nested, then you can use the change in deviance of the 2 models to see if the model containing extra parameters yields an improved fit. If model 1 contains k parameters and model 2 contains those same k parameters plus an additional m parameters, then the change in deviance follows an (approximately) chi-square distribution with m degrees of freedom. You can use this test statistic to see if model 2 is an improvement on model 1.

If you are new to this area, I would strongly recommend reading an introductory text on GLMs

How to extract a p-value when performing anova() between two glm models in R

Tags:

r

model

regression

glm

anova

Atticus29

2 Answers

IRTFM

mathematician1975

Recent Activity

Donate For Us

How to extract a p-value when performing anova() between two glm models in R

Tags:

r

model

regression

glm

anova

Atticus29

2 Answers

IRTFM

mathematician1975

Related questions

Recent Activity

Donate For Us