Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicted values for logistic regression from glm and stat_smooth in ggplot2 are different

Tags:

r

ggplot2

I'm trying to make this logistic regression graph in ggplot2.

df <- structure(list(y = c(2L, 7L, 776L, 19L, 12L, 26L, 7L, 12L, 8L,
24L, 20L, 16L, 12L, 10L, 23L, 20L, 16L, 12L, 18L, 22L, 23L, 22L,
13L, 7L, 20L, 12L, 13L, 11L, 11L, 14L, 10L, 8L, 10L, 11L, 5L,
5L, 1L, 2L, 1L, 1L, 0L, 0L, 0L), n = c(3L, 7L, 789L, 20L, 14L,
27L, 7L, 13L, 9L, 29L, 22L, 17L, 14L, 11L, 30L, 21L, 19L, 14L,
22L, 29L, 28L, 28L, 19L, 10L, 27L, 22L, 18L, 18L, 14L, 23L, 18L,
12L, 19L, 15L, 13L, 9L, 7L, 3L, 1L, 1L, 1L, 1L, 1L), x = c(18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L,
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 59L,
62L, 63L, 66L)), .Names = c("y", "n", "x"), class = "data.frame", row.names = c(NA,
-43L))


mod.fit <- glm(formula = y/n ~ x, data = df, weight=n, family = binomial(link = logit),
        na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = T))
summary(mod.fit)

Pi <- c(0.25, 0.5, 0.75)
LD <- (log(Pi /(1-Pi))-mod.fit$coefficients[1])/mod.fit$coefficients[2]
LD.summary <- data.frame(Pi , LD)
LD.summary


plot(df$x, df$y/df$n, xlab = "x", ylab = "Estimated probability")

lin.pred <- predict(mod.fit)
pi.hat <- exp(lin.pred)/(1 + exp(lin.pred))
lines(df$x, pi.hat, lty = 1, col = "red")


segments(x0 = LD.summary$LD, y0 = -0.1, x1 = LD.summary$LD, y1 = LD.summary$Pi,
         lty=2, col=c("darkblue","darkred","darkgreen"))
segments(x0 = 15, y0 = LD.summary$Pi, x1 = LD.summary$LD, y1 = LD.summary$Pi,
         lty=2, col=c("darkblue","darkred","darkgreen"))
legend("bottomleft", legend=c("LD25", "LD50", "LD75"), lty=2, col=c("darkblue","darkred","darkgreen"), bty="n", cex=0.75)

enter image description here

Here is my attempt with ggplot2

library(ggplot2)

p <- ggplot(data = df, aes(x = x, y = y/n)) +
            geom_point() +
            stat_smooth(method = "glm", family = "binomial")

p <- p + geom_segment(aes(
                            x = LD.summary$LD
                          , y = 0
                          , xend = LD.summary$LD
                          , yend = LD.summary$Pi
                         )
                         , colour="red"
                       )

p <- p + geom_segment(aes(
                            x = 0
                          , y = LD.summary$Pi
                          , xend = LD.summary$LD
                          , yend = LD.summary$Pi
                         )
                         , colour="red"
                       )

print(p)

enter image description here

Questions

  1. Predicted values for glm and stat_smooth look different. Are these two methods produces different results or I'm missing something here.
  2. My ggplot2 graph is not exactly as base R graph.
  3. How to use different colours for line segments in ggplot2?
  4. And how to put legend in ggplot2?

Thanks in advance for your help and time. Thanks

like image 464
MYaseen208 Avatar asked Jan 13 '12 02:01

MYaseen208


1 Answers

Just a couple of minor additions to @mathetmatical.coffee's answer. Typically, geom_smooth isn't supposed to replace actual modeling, which is why it can seem inconvenient at times when you want to use specific output you'd get from glm and such. But really, all we need to do is add the fitted values to our data frame:

df$pred <- pi.hat
LD.summary$group <- c('LD25','LD50','LD75')

ggplot(df,aes(x = x, y = y/n)) + 
    geom_point() + 
    geom_line(aes(y = pred),colour = "black") + 
    geom_segment(data=LD.summary, aes(y = Pi,
                                      xend = LD,
                                      yend = Pi,
                                      col = group),x = -Inf,linetype = "dashed") + 
    geom_segment(data=LD.summary,aes(x = LD,
                                     xend = LD,
                                     yend = Pi,
                                     col = group),y = -Inf,linetype = "dashed")

enter image description here

The final little trick is the use of Inf and -Inf to get the dashed lines to extend all the way to the plot boundaries.

The lesson here is that if all you want to do is add a smooth to a plot, and nothing else in the plot depends on it, use geom_smooth. If you want to refer to the output from the fitted model, its generally easier to fit the model outside ggplot and then plot.

like image 87
joran Avatar answered Sep 27 '22 20:09

joran