I have some data that I have gathered from a model. I want to plot the size of a population over time. I have the population size at each time step, and 100 replicates. I would like to plot the mean population size for each time step, and also the 95% confidence intervals (as a shading if possible).
I haven't used ggplot
before. I have just been using the ordinary (base) plots in R so far. But I want to see what the ggplot
would look like.
Here's what I have so far:
ggplot(data=model1, aes(x=steps., y= pop-size, col='blue')) +
geom_line()
This plots all the points, and it looks good, but I don't know how to just plot the means and add the confidence intervals.
Since you have replicated data, and you want to plot mean/CL, you are probably better off using stat_summary(...)
which is designed for (you guessed it) summarizing data. Basically, it applies a function to all the y-values for each x-value (so, the mean(...)
function for example), and then plots the result using whatever geometry you specify. Here's an example:
# sample data - should be provided in question
set.seed(1) # for reproducible example
time <- 1:25
df <- data.frame(time,
pop=rnorm(100*length(time), mean=10*time/(25+time)))
library(ggplot2)
ggplot(df, aes(x=time, y=pop))+
stat_summary(geom="ribbon", fun.data=mean_cl_normal, width=0.1, conf.int=0.95, fill="lightblue")+
stat_summary(geom="line", fun.y=mean, linetype="dashed")+
stat_summary(geom="point", fun.y=mean, color="red")
So here we have 3 layers: a layer that summarizes the y-values using the mean(...)
function, and plots using geom="line"
, a layer that summarizes the same way but plots using geom="point"
, and a layer that uses geom="ribbon"
This geom requires ymin
and ymax
aesthetics, so we use the built-in ggplot function mean_cl_normal
to generate those based on the assumption that the error is normally distributed and that, therefore, the means follow a t-distribution. Type ?hmisc
for documentation on the various functions that are useful for confidence limits. The layers render in the order of the code, so, since you want shading, we need to put the error ribbon first.
Finally, it is of course possible to summarize the data yourself, using dplyr
or some such, but I don't really see the point of doing that.
Update (based on recent comment):
Looks like the most recent version of ggplot2
(2.0.0) has a different way of specifying the arguments to fun.data
. This works in the new version:
ggplot(df, aes(x=time, y=pop))+
stat_summary(geom="ribbon", fun.data=mean_cl_normal,
fun.args=list(conf.int=0.95), fill="lightblue")+
stat_summary(geom="line", fun.y=mean, linetype="dashed")+
stat_summary(geom="point", fun.y=mean, color="red")
The problem with the width=...
argument is a bit more subtle I think: it actually isn't needed (in the original answer I used error bars, and forgot to remove this argument when I changed it to ribbon). The older version of ggplot2 ignored extraneous arguments (hence, no error). The new version, evidently, is more strict. Probably this is better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With