Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confidence intervals for predictions from logistic regression

In R predict.lm computes predictions based on the results from linear regression and also offers to compute confidence intervals for these predictions. According to the manual, these intervals are based on the error variance of fitting, but not on the error intervals of the coefficient.

On the other hand predict.glm which computes predictions based on logistic and Poisson regression (amongst a few others) doesn't have an option for confidence intervals. And I even have a hard time imagining how such confidence intervals could be computed to provide a meaningful insight for Poisson and logistic regression.

Are there cases in which it is meaningful to provide confidence intervals for such predictions? How can they be interpreted? And what are the assumptions in these cases?

like image 936
unique2 Avatar asked Jan 20 '13 09:01

unique2


People also ask

Does logistic regression have confidence intervals?

This procedure calculates sample size for the case when there is only one, binary covariate (X) in the logistic regression model and a Wald statistic is used to calculate a confidence interval for the odds ratio of Y to X. Often, Y is called the response variable and X is referred to as the exposure variable.

What is the 95% confidence interval in the logistic regression model?

The odds ratio estimate is 1.227; the 95% confidence interval is (0.761, 1.979).

What is a 95% confidence interval for the prediction?

A 95% confidence level means that out of 100 random samples taken, I expect 95 of the confidence intervals to contain the true population parameter.


1 Answers

The usual way is to compute a confidence interval on the scale of the linear predictor, where things will be more normal (Gaussian) and then apply the inverse of the link function to map the confidence interval from the linear predictor scale to the response scale.

To do this you need two things;

  1. call predict() with type = "link", and
  2. call predict() with se.fit = TRUE.

The first produces predictions on the scale of the linear predictor, the second returns the standard errors of the predictions. In pseudo code

## foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y") ## Working example data mod <- glm(y ~ x, data = foo, family = binomial) preddata <- with(foo, data.frame(x = seq(min(x), max(x), length = 100))) preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE) 

preds is then a list with components fit and se.fit.

The confidence interval on the linear predictor is then

critval <- 1.96 ## approx 95% CI upr <- preds$fit + (critval * preds$se.fit) lwr <- preds$fit - (critval * preds$se.fit) fit <- preds$fit 

critval is chosen from a t or z (normal) distribution as required (I forget exactly now which to use for which type of GLM and what the properties are) with the coverage required. The 1.96 is the value of the Gaussian distribution giving 95% coverage:

> qnorm(0.975) ## 0.975 as this is upper tail, 2.5% also in lower tail [1] 1.959964 

Now for fit, upr and lwr we need to apply the inverse of the link function to them.

fit2 <- mod$family$linkinv(fit) upr2 <- mod$family$linkinv(upr) lwr2 <- mod$family$linkinv(lwr) 

Now you can plot all three and the data.

preddata$lwr <- lwr2  preddata$upr <- upr2  ggplot(data=foo, mapping=aes(x=x,y=y)) + geom_point() +             stat_smooth(method="glm", method.args=list(family=binomial)) +     geom_line(data=preddata, mapping=aes(x=x, y=upr), col="red") +     geom_line(data=preddata, mapping=aes(x=x, y=lwr), col="red")  

enter image description here

like image 170
Gavin Simpson Avatar answered Sep 28 '22 06:09

Gavin Simpson