Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting profile likelihood curves in R

Tags:

r

I am trying to figure out how to plot the profile likelihood curve of a GLM parameter with 95% pLCI's on the same plot. The example I have been trying with is below. The plots I am getting are not the likelihood curves that I was expecting. The y-axis of the plots is tau and I would like that axis to be the likelihood so that I have a curve that maxes at the parameter estimate. I am not sure where I find those likelihood values? I may just be misinterpreting the theory behind this. Thanks for any help you can give.

Max

clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
glm2<-glm(lot2 ~ log(u), data=clotting, family=Gamma)
prof<-profile(glm2)
plot(prof) 
like image 864
ADW11 Avatar asked Aug 11 '12 16:08

ADW11


2 Answers

Regenerate your example:

clotting <- data.frame(
                       u = c(5,10,15,20,30,40,60,80,100),
                       lot1 = c(118,58,42,35,27,25,21,19,18),
                       lot2 = c(69,35,26,21,18,16,13,12,12))
glm2 <- glm(lot2 ~ log(u), data=clotting, family=Gamma)

The profile.glm function actually lives in the MASS package:

library(MASS)
prof<-profile(glm2)

In order to figure out what profile.glm and plot.profile are doing, see ?profile.glm and ?plot.profile. However, in order to dig into the profile object it may also be useful to examine the code of MASS:::profile.glm and MASS:::plot.profile ... basically, what these tell you is that profile is returning the signed square root of the difference between the deviance and the minimum deviance, scaled by the dispersion parameter. The reason that this is done is so that the profile for a perfectly quadratic profile will appear as a straight line (it's much easier to detect deviations from a straight line than from a parabola by eye).

The other thing that may be useful to know is how the profile is stored. Basically, it's a list of data frames (one for each parameter profiled), except that the individual data frames are a little bit weird (containing one vector component and one matrix component).

> str(prof)
List of 2
 $ (Intercept):'data.frame':    12 obs. of  3 variables:
  ..$ tau     : num [1:12] -3.557 -2.836 -2.12 -1.409 -0.702 ...
  ..$ par.vals: num [1:12, 1:2] -0.0286 -0.0276 -0.0267 -0.0258 -0.0248 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "(Intercept)" "log(u)"
  ..$ dev     : num [1:12] 0.00622 0.00753 0.00883 0.01012 0.0114 ...
 $ log(u)     :'data.frame':    12 obs. of  2 variables:
  ..$ tau     : num [1:12] -3.516 -2.811 -2.106 -1.403 -0.701 ...
  ..$ par.vals: num [1:12, 1:2] -0.0195 -0.0204 -0.0213 -0.0222 -0.023 ...
  .. ..- attr(*, "dimnames")=List of 2

It also contains attributes summary and original.fit that you can use to recover the dispersion and minimum deviance:

disp <- attr(prof,"summary")$dispersion
mindev <- attr(prof,"original.fit")$deviance

Now reverse the transformation for parameter 1:

dev1 <- prof[[1]]$tau^2
dev2 <- dev1*disp+mindev

Plot:

plot(prof[[1]][,1],dev2,type="b")

(This is the plot of the deviance. You can multiply by 0.5 to get the negative log-likelihood, or -0.5 to get the log-likelihood ...)

edit: some more general functions to transform the profile into a useful format for lattice/ggplot plotting ...

tmpf <- function(x,n) {
    data.frame(par=n,tau=x$tau,
               deviance=x$tau^2*disp+mindev,
               x$par.vals,check.names=FALSE)
}
pp <- do.call(rbind,mapply(tmpf,prof,names(prof),SIMPLIFY=FALSE))
library(reshape2)
pp2 <- melt(pp,id.var=1:3)
pp3 <- subset(pp2,par==variable,select=-variable)

Now plot it with lattice:

library(lattice)
xyplot(deviance~value|par,type="b",data=pp3,
       scales=list(x=list(relation="free")))

enter image description here

Or with ggplot2:

library(ggplot2)
ggplot(pp3,aes(value,deviance))+geom_line()+geom_point()+
    facet_wrap(~par,scale="free_x")

enter image description here

like image 76
Ben Bolker Avatar answered Sep 22 '22 12:09

Ben Bolker


FYI, for fun, I took the above and whipped it together into a single function using purrr::imap_dfr as I couldn't find a package that implements the above.

get_profile_glm <- function(aglm){
  prof <- MASS:::profile.glm(aglm)
  disp <- attr(prof,"summary")$dispersion

  purrr::imap_dfr(prof, .f = ~data.frame(par = .y,
                                deviance=.x$z^2*disp+aglm$deviance, 
                                values = as.data.frame(.x$par.vals)[[.y]],
                                stringsAsFactors = FALSE))

}

Works great!

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())

ggplot(get_profile_glm(aglm), aes(x = values, y = deviance)) +
  geom_point() +
  geom_line() +
  facet_wrap(~par, scale = "free_x")

enter image description here

like image 35
jebyrnes Avatar answered Sep 22 '22 12:09

jebyrnes