Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What standard errors are returned with predict.glm(..., type = "response", se.fit = TRUE)?

I am going to fit the model on the data provided in this excellent example on how to compute the 95% confidence interval for the response, after performing a logistic regression:

foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y")
mod <- glm(y ~ x, data = foo, family = binomial)
preddata <- with(foo, data.frame(x = seq(min(x), max(x), length = 100)))
preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE)

critval <- 1.96 ## approx 95% CI
upr <- preds$fit + (critval * preds$se.fit)
lwr <- preds$fit - (critval * preds$se.fit)
fit <- preds$fit

fit2 <- mod$family$linkinv(fit)
upr2 <- mod$family$linkinv(upr)
lwr2 <- mod$family$linkinv(lwr)

Now, my question comes from the fact that you can get the response directly, simply by asking for

predict(..., type = 'response', se.fit = TRUE)

without transforming

predict(..., type = 'link', se.fit = TRUE)

However, this produces different standard errors. What are these errors and can they be directly used to calculate the confidence interval on the fitted response values?

The code in predict.glm that is relevant is this:

switch(type, response = {
    se.fit <- se.fit * abs(family(object)$mu.eta(fit))
    fit <- family(object)$linkinv(fit)
}, link = , terms = )

and comparing outputs:

preds_2 <- predict(mod, newdata = preddata, type = "response", se.fit = TRUE)


> head(preds_2$fit)
         1          2          3          4          5          6 
0.01265744 0.01399994 0.01548261 0.01711957 0.01892627 0.02091959 

> head(preds_2$se.fit)
         1          2          3          4          5          6 
0.01944681 0.02098841 0.02263473 0.02439022 0.02625902 0.02824491 

It doesn't seem very obvious how to go from the above to:

> head(fit2)
         1          2          3          4          5          6 
0.01265744 0.01399994 0.01548261 0.01711957 0.01892627 0.02091959 
> head(upr2)
        1         2         3         4         5         6 
0.2130169 0.2184891 0.2240952 0.2298393 0.2357256 0.2417589 
> head(lwr2)
           1            2            3            4            5            6 
0.0006067975 0.0007205942 0.0008555502 0.0010155472 0.0012051633 0.0014297930 
like image 470
Alex Avatar asked Jun 14 '17 03:06

Alex


1 Answers

If you look at ?family, you will see that $mu.eta is the derivative of mu over eta (i.e., the derivative of inverse link function). Therefore, se.fit = TRUE for type = "response" gives a first order approximation for the true standard error. This is known as "delta method".

Of course, since inverse link function is generally non-linear, the confidence interval is not symmetric around fitted values. So getting the confidence interval on the link scale, then mapping it back to response scale is the way to go.

like image 52
Zheyuan Li Avatar answered Oct 13 '22 21:10

Zheyuan Li