I am going to fit the model on the data provided in this excellent example on how to compute the 95% confidence interval for the response, after performing a logistic regression:
foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y")
mod <- glm(y ~ x, data = foo, family = binomial)
preddata <- with(foo, data.frame(x = seq(min(x), max(x), length = 100)))
preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE)
critval <- 1.96 ## approx 95% CI
upr <- preds$fit + (critval * preds$se.fit)
lwr <- preds$fit - (critval * preds$se.fit)
fit <- preds$fit
fit2 <- mod$family$linkinv(fit)
upr2 <- mod$family$linkinv(upr)
lwr2 <- mod$family$linkinv(lwr)
Now, my question comes from the fact that you can get the response directly, simply by asking for
predict(..., type = 'response', se.fit = TRUE)
without transforming
predict(..., type = 'link', se.fit = TRUE)
However, this produces different standard errors. What are these errors and can they be directly used to calculate the confidence interval on the fitted response values?
The code in predict.glm
that is relevant is this:
switch(type, response = {
se.fit <- se.fit * abs(family(object)$mu.eta(fit))
fit <- family(object)$linkinv(fit)
}, link = , terms = )
and comparing outputs:
preds_2 <- predict(mod, newdata = preddata, type = "response", se.fit = TRUE)
> head(preds_2$fit)
1 2 3 4 5 6
0.01265744 0.01399994 0.01548261 0.01711957 0.01892627 0.02091959
> head(preds_2$se.fit)
1 2 3 4 5 6
0.01944681 0.02098841 0.02263473 0.02439022 0.02625902 0.02824491
It doesn't seem very obvious how to go from the above to:
> head(fit2)
1 2 3 4 5 6
0.01265744 0.01399994 0.01548261 0.01711957 0.01892627 0.02091959
> head(upr2)
1 2 3 4 5 6
0.2130169 0.2184891 0.2240952 0.2298393 0.2357256 0.2417589
> head(lwr2)
1 2 3 4 5 6
0.0006067975 0.0007205942 0.0008555502 0.0010155472 0.0012051633 0.0014297930
If you look at ?family
, you will see that $mu.eta
is the derivative of mu
over eta
(i.e., the derivative of inverse link function). Therefore, se.fit = TRUE
for type = "response"
gives a first order approximation for the true standard error. This is known as "delta method".
Of course, since inverse link function is generally non-linear, the confidence interval is not symmetric around fitted values. So getting the confidence interval on the link scale, then mapping it back to response scale is the way to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With