Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is AIC calculated in stepAIC

Tags:

r

Here is a very simple lm model from ?lm

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)

If I use stepAIC to lm.D9, on the very first line it says AIC = -12.58

require(MASS)
stepAIC(lm.D9)

If I use AIC directly on lm.D9, it gives a different value 46.17648

AIC(lm.D9)

My question is why the 2 AIC values are different. Thanks!

like image 415
FMZ Avatar asked Dec 04 '11 07:12

FMZ


1 Answers

This was annoying me, so I decided to work it out from first principles.

Re-fit the model:

d <- data.frame(weight=
                c(ctl=c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14),
                  trt=c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)),
                group=gl(2,10,20, labels=c("Ctl","Trt")))
lm.D9 <- lm(weight ~ group, d)

Values returned by standard accessors:

(AIC1 <- AIC(lm.D9))
> 46.17468
(LL1 <- logLik(lm.D9))
> -20.08824 (df=3)

Reconstruct from first principles:

n <- nrow(d)
ss0 <- summary(lm.D9)$sigma
ss <- ss0*(n-1)/n
(LL2 <- sum(dnorm(d$weight,fitted(lm.D9),
                 sd=ss,log=TRUE)))
> -20.08828

This is a tiny bit off, haven't found the glitch.

Number of parameters:

npar <- length(coef(lm.D9))+1 


(AIC2 <- -2*LL2+2*npar)
> 46.1756

Still off by more than numeric fuzz, but only about one part in a million.

Now let's see what stepAIC is doing:

MASS::stepAIC(lm.D9)  ## start: AIC = -12.58
extractAIC(lm.D9)     ## same value (see MASS::stepAIC for details)
stats:::extractAIC.lm ## examine the code


RSS1 <- deviance(lm.D9)   ## UNSCALED sum of squares
RSS2 <- sum((d$weight-fitted(lm.D9))^2)  ## ditto, from first principles
AIC3 <- n*log(RSS1/n)+2*2 ## formula used within extractAIC

You can work out the formula used above from sigma-hat=RSS/n -- or see Venables and Ripley MASS for the derivation.

Add missing terms: uncounted variance parameter, plus normalization constant

(AIC3 + 2 - 2*(-n/2*(log(2*pi)+1)))

This is exactly the same (to 1e-14) as AIC1 above

like image 98
Ben Bolker Avatar answered Oct 13 '22 23:10

Ben Bolker