Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract a p-value when performing anova() between two glm models in R

So, I'm trying to compare two models, fit1 and fit2.

Initially, I was just doing anova(fit1,fit2), and this yielded output that I understood (including a p-value).

However, when I switched my models from lm()-based models to glm()-based models, anova(fit1,fit2) now yielded Residual Degrees of Freedom, Residuals Deviances, and Df Deviances, which I am having trouble interpreting (resources explaining these metrics seem scarce). I was hoping to extract a p-value for the comparison between the two models, but for some reason anova(fit1,fit2, test='Chisq') isn't working. Any suggestions?

I realize that, depending on the link function in my glms, Chi-squared may not be the most appropriate test, but I have used 'F' in appropriate contexts as well with similar disappointment.

Is this problem familiar to anybody else? Suggestions? Many thanks!

Example:

make_and_compare_models <- function(fitness_trait_name, data_frame_name, vector_for_multiple_regression, predictor_for_single_regression, fam){
        fit1<-glm(formula=as.formula(paste(fitness_trait_name,"~", paste(vector_for_multiple_regression, sep="+"))), family=fam, data=data_frame_name)
        print ("summary fit 1")
        print(summary(fit1))
        fit2<- glm(data=data_frame_name, formula=as.formula(paste(fitness_trait_name,"~",predictor_for_single_regression)), family=fam)

        print("summary fit 2")
        print(summary(fit2))
        print("model comparison stats:")
        mod_test<-anova(fit2,fit1)

        ##suggestion #1
        print(anova(fit2,fit1, test="Chisq"))

        #suggestion #2
        print ("significance:")
    print (1-pchisq( abs(mod_test$Deviance[2]),df=abs(mod_test$Df[2])))

        }


data<-structure(list(ID = c(1L, 2L, 4L, 7L, 9L, 10L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 29L, 
31L, 34L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 45L, 46L, 47L, 48L, 
49L, 52L, 55L, 56L, 59L, 60L, 61L, 62L, 63L, 65L, 66L, 67L, 68L, 
69L, 71L), QnWeight_initial = c(158L, 165L, 137L, 150L, 153L, 
137L, 158L, 163L, 159L, 151L, 145L, 144L, 157L, 144L, 133L, 148L, 
151L, 151L, 147L, 158L, 178L, 164L, 134L, 151L, 148L, 142L, 127L, 
179L, 162L, 150L, 151L, 153L, 163L, 155L, 163L, 170L, 149L, 165L, 
128L, 134L, 145L, 147L, 148L, 160L, 131L, 155L, 169L, 143L, 123L, 
151L), Survived_eclosion = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Days_wrkr_eclosion_minus20 = c(NA, 
1L, NA, 3L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, NA, 0L, 7L, 1L, 0L, 
1L, 0L, 1L, 2L, 2L, NA, 2L, 3L, 2L, 2L, NA, 0L, 1L, NA, NA, 0L, 
0L, 0L, 0L, 3L, 3L, 3L, 1L, 0L, 2L, NA, 1L, 0L, 1L, 1L, 3L, 1L, 
2L), MLH = c(0.5, 0.666666667, 0.555555556, 0.25, 1, 0.5, 0.333333333, 
0.7, 0.5, 0.7, 0.5, 0.666666667, 0.375, 0.4, 0.5, 0.333333333, 
0.4, 0.375, 0.3, 0.5, 0.3, 0.2, 0.4, 0.875, 0.6, 0.4, 0.222222222, 
0.222222222, 0.6, 0.6, 0.3, 0.4, 0.714285714, 0.4, 0.3, 0.6, 
0.4, 0.7, 0.625, 0.555555556, 0.25, 0.5, 0.5, 0.6, 0.25, 0.428571429, 
0.3, 0.25, 0.375, 0.555555556), Acon5 = c(0.35387674, 0.35387674, 
0.35387674, 0.35387674, 0.35387674, 0.35387674, 0.35387674, 0, 
0, 1, 0, 1, 0.35387674, 0, 0, 0.35387674, 1, 1, 0, 0, 0, 1, 0, 
0.35387674, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 
0, 0, 1, 0, 0, 0, 1, 0, 0.35387674), Baez = c(1, 1, 1, 0.467836257, 
1, 1, 0, 0, 1, 1, 0, 0.467836257, 1, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0.467836257, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), C294 = c(0, 1, 0, 0, 1, 
0.582542694, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 
0, 1, 1, 0, 0, 0.582542694, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1), C316 = c(1, 1, 0, 0, 0.519685039, 
0.519685039, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0.519685039, 0, 
1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0.519685039, 1, 0, 1, 
1, 0, 0.519685039, 1, 0.519685039, 1, 1, 1, 0.519685039, 0.519685039, 
0, 0.519685039, 0.519685039, 0), i_120_PigTail = c(1, 1, 0, 1, 
0.631236443, 0.631236443, 1, 1, 1, 1, 1, 0, 0.631236443, 1, 1, 
1, 0, 0.631236443, 1, 1, 1, 0, 0, 1, 1, 1, 0.631236443, 0, 1, 
1, 0, 1, 0.631236443, 1, 0, 1, 0, 0, 1, 0.631236443, 0.631236443, 
0, 1, 0, 0.631236443, 0.631236443, 1, 0.631236443, 0.631236443, 
1), i129 = c(0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), Jackstraw_PigTail = c(0L, 1L, 1L, 0L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 
0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Neil_Young = c(0.529636711, 
0, 1, 0, 0.529636711, 0.529636711, 1, 1, 0, 1, 1, 1, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1), Ramble = c(0, 0, 0, 
0, 0.215163934, 0.215163934, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0.215163934, 0, 
0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0.215163934, 0, 0, 0, 0), Sol_18 = c(1, 
0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0.404669261, 
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1)), .Names = c("ID", "QnWeight_initial", 
"Survived_eclosion", "Days_wrkr_eclosion_minus20", "MLH", "Acon5", 
"Baez", "C294", "C316", "i_120_PigTail", "i129", "Jackstraw_PigTail", 
"Neil_Young", "Ramble", "Sol_18"), class = "data.frame", row.names = c(NA, 
-50L))

make_and_compare_models("QnWeight_initial", data, c("Acon5","Baez","C294","C316","i_120_PigTail","i129","Jackstraw_PigTail","Neil_Young","Ramble","Sol_18"), "MLH", "gaussian")
like image 412
Atticus29 Avatar asked Nov 05 '12 18:11

Atticus29


2 Answers

The difference in deviance between a "larger" or more complex model and a nested or "reduced" model is distributed (asymptotically) as a chi-squared variate with the difference in degrees of freedom of the two models. So you would extract the deviance estimate and the difference in degrees of freedom and compare that to pchisq( deviance, diff(df) ). The "p-value" is just 1 minus that value.

> 1-pchisq(3.84,1)
[1] 0.05004352

If you run the first example in the glm help page and then add a reduced model without the "treatment" variable, you get:

glm.D93.o <- glm(counts ~ outcome, family=poisson())
 anova.res <-anova(glm.D93, glm.D93.o)
 anova.res
#------------
Analysis of Deviance Table

Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome
  Resid. Df Resid. Dev Df    Deviance
1         4     5.1291               
2         6     5.1291 -2 -2.6645e-15
#---------------
 str(anova.res)
Classes ‘anova’ and 'data.frame':   2 obs. of  4 variables:
 $ Resid. Df : num  4 6
 $ Resid. Dev: num  5.13 5.13
 $ Df        : num  NA -2
 $ Deviance  : num  NA -2.66e-15
 - attr(*, "heading")= chr  "Analysis of Deviance Table\n" "Model 1: counts ~ outcome + treatment\nModel 2: counts ~ outcome"

So after looking at how things were stored in the object itself, this give the p-value for "outcome":

 1-pchisq( abs(anova.res$Deviance[2]), abs(anova.res$Df[2]))
[1] 1

And this would be the corresponding procedure on the treatment+outcome model versus the treatment-only model:

> glm.D93.t <- glm(counts ~ treatment, family=poisson())
> anova.res2 <-anova(glm.D93, glm.D93.t)
> 1-pchisq( abs(anova.res2$Deviance[2]), abs(anova.res2$Df[2]))
[1] 0.06547071
like image 135
IRTFM Avatar answered Nov 18 '22 20:11

IRTFM


If your 2 models are nested, then you can use the change in deviance of the 2 models to see if the model containing extra parameters yields an improved fit. If model 1 contains k parameters and model 2 contains those same k parameters plus an additional m parameters, then the change in deviance follows an (approximately) chi-square distribution with m degrees of freedom. You can use this test statistic to see if model 2 is an improvement on model 1.

If you are new to this area, I would strongly recommend reading an introductory text on GLMs

like image 45
mathematician1975 Avatar answered Nov 18 '22 18:11

mathematician1975