Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

plotting the means with confidence intervals with ggplot

Tags:

r

I have some data that I have gathered from a model. I want to plot the size of a population over time. I have the population size at each time step, and 100 replicates. I would like to plot the mean population size for each time step, and also the 95% confidence intervals (as a shading if possible).

I haven't used ggplot before. I have just been using the ordinary (base) plots in R so far. But I want to see what the ggplot would look like.

Here's what I have so far:

ggplot(data=model1, aes(x=steps., y= pop-size, col='blue')) + 
   geom_line()

This plots all the points, and it looks good, but I don't know how to just plot the means and add the confidence intervals.

like image 339
91dpo Avatar asked Sep 19 '15 14:09

91dpo


1 Answers

Since you have replicated data, and you want to plot mean/CL, you are probably better off using stat_summary(...) which is designed for (you guessed it) summarizing data. Basically, it applies a function to all the y-values for each x-value (so, the mean(...) function for example), and then plots the result using whatever geometry you specify. Here's an example:

# sample data - should be provided in question
set.seed(1)      # for reproducible example
time <- 1:25
df   <- data.frame(time,
                   pop=rnorm(100*length(time), mean=10*time/(25+time)))

library(ggplot2)
ggplot(df, aes(x=time, y=pop))+ 
  stat_summary(geom="ribbon", fun.data=mean_cl_normal, width=0.1, conf.int=0.95, fill="lightblue")+
  stat_summary(geom="line", fun.y=mean, linetype="dashed")+
  stat_summary(geom="point", fun.y=mean, color="red")

So here we have 3 layers: a layer that summarizes the y-values using the mean(...) function, and plots using geom="line", a layer that summarizes the same way but plots using geom="point", and a layer that uses geom="ribbon" This geom requires ymin and ymax aesthetics, so we use the built-in ggplot function mean_cl_normal to generate those based on the assumption that the error is normally distributed and that, therefore, the means follow a t-distribution. Type ?hmisc for documentation on the various functions that are useful for confidence limits. The layers render in the order of the code, so, since you want shading, we need to put the error ribbon first.

Finally, it is of course possible to summarize the data yourself, using dplyr or some such, but I don't really see the point of doing that.

Update (based on recent comment): Looks like the most recent version of ggplot2 (2.0.0) has a different way of specifying the arguments to fun.data. This works in the new version:

ggplot(df, aes(x=time, y=pop))+ 
    stat_summary(geom="ribbon", fun.data=mean_cl_normal, 
                 fun.args=list(conf.int=0.95), fill="lightblue")+
    stat_summary(geom="line", fun.y=mean, linetype="dashed")+
    stat_summary(geom="point", fun.y=mean, color="red")

The problem with the width=... argument is a bit more subtle I think: it actually isn't needed (in the original answer I used error bars, and forgot to remove this argument when I changed it to ribbon). The older version of ggplot2 ignored extraneous arguments (hence, no error). The new version, evidently, is more strict. Probably this is better.

like image 65
jlhoward Avatar answered Oct 01 '22 22:10

jlhoward