Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine stat_ecdf with geom_ribbon?

Tags:

plot

r

ggplot2

I am trying to draw an ECDF of some data with a "confidence interval" represented via a shaded region using ggplot2. I am having trouble combining geom_ribbon() with stat_ecdf() to achieve the effect I am after.

Consider the following example data:

set.seed(1)
dat <- data.frame(variable = rlnorm(100) + 2)
dat <- transform(dat, lower = variable - 2, upper = variable + 2)

> head(dat)
  variable     lower    upper
1 2.534484 0.5344838 4.534484
2 3.201587 1.2015872 5.201587
3 2.433602 0.4336018 4.433602
4 6.929713 4.9297132 8.929713
5 3.390284 1.3902836 5.390284
6 2.440225 0.4402254 4.440225

I am able to produce an ECDF of variable using

library("ggplot2")
ggplot(dat, aes(x = variable)) +
    geom_step(stat = "ecdf")

However I am unable to use lower and upper as the ymin and ymax aesthetics of geom_ribbon() to superimpose the confidence interval on the plot as another layer. I have tried:

ggplot(dat, aes(x = variable)) +
    geom_ribbon(aes(ymin = lower, ymax = upper), stat = "ecdf") +
    geom_step(stat = "ecdf")

but this raises the following error

Error: geom_ribbon requires the following missing aesthetics: ymin, ymax

Is there a way to coax geom_ribbon() into working with stat_ecdf() to produce a shaded confidence interval? Or, can anyone suggest an alternative means of adding a shaded polygon defined by lower and upper as a layer to the ECDF plot?

like image 469
Gavin Simpson Avatar asked Nov 29 '13 02:11

Gavin Simpson


2 Answers

Try this (a bit of shot in the dark):

ggplot(dat, aes(x = variable)) +
  geom_ribbon(aes(x = variable,ymin = ..y..-2,ymax = ..y..+2), stat = "ecdf",alpha=0.2) +
  geom_step(stat = "ecdf")

Ok, so that's not the same thing as what you trying to do, but it should explain what's going on. The stat is returning a data frame with just the original x and the computed y, so I think that's all you have to work with. i.e. stat_ecdf only computes the cumulative distribution function for a single x at a time.

The only other thing I can think of is the obvious, calculating the lower and upper separately, something like this:

l <- ecdf(dat$lower)
u <- ecdf(dat$upper)
v <- ecdf(dat$variable)
dat$lower1 <- l(dat$variable)
dat$upper1 <- u(dat$variable)
dat$variable1 <- v(dat$variable)

ggplot(dat,aes(x = variable)) + 
  geom_step(aes(y = variable1)) + 
  geom_ribbon(aes(ymin = upper1,ymax = lower1),alpha = 0.2)
like image 136
joran Avatar answered Sep 28 '22 08:09

joran


Not sure exactly how you want to reflect the CI, but ggplot_build() lets you get the generated data back from the plot, you can then overplot what you like.

This chart shows:

  • red = original ribbon
  • blue = takes the original CI vectors and applies to the ecdf curve
  • green = calculates the ecdf of upper and lower series and plots

enter image description here

    g<-ggplot(dat, aes(x = variable)) +
      geom_step(stat = "ecdf") +
      geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.5, fill="red") 

    inside<-ggplot_build(g)
    matched<-merge(inside$data[[1]],data.frame(x=dat$variable,dat$lower,dat$upper),by=("x"))

    g + 
      geom_ribbon(data=matched, aes(x = x, 
                                      ymin = y + dat.upper-x,
                                      ymax = y - x + dat.lower), 
                    alpha=0.5, fill="blue") +
      geom_ribbon(data=matched, aes(x = x, 
                                      ymin = ecdf(dat.lower)(x),
                                      ymax = ecdf(dat.upper)(x)), 
                    alpha=0.5, fill="green")
like image 32
Troy Avatar answered Sep 28 '22 10:09

Troy