Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Plot a time series with quantiles using ggplot2

I need to plot a time series with ggplot2. For each point of the time series I also have some quantiles, say 0.05, 0.25, 0.75, 0.95, i.e. I have five data for each point. For example:

time           quantile=0.05  quantile=0.25 quantile=0.5  quantile=0.75   quantile=0.95
00:01          623.0725       630.4353      903.8870       959.1407       1327.721
00:02          623.0944       631.3707      911.9967      1337.4564       1518.539
00:03          623.0725       630.4353      903.8870      1170.8316       1431.893
00:04          623.0725       630.4353      903.8870      1336.3212       1431.893
00:05          623.0835       631.3557      905.4220      1079.6623       1452.260
00:06          623.0835       631.3557      905.4220      1079.6623       1452.260
00:07          623.0835       631.3557      905.4220      1079.6623       1452.260
00:08          623.0780       631.3483      905.3496      1056.3719       1375.610
00:09          623.0671       630.4275      903.8839      1170.8196       1356.963
00:10          623.0507       630.0261      741.8475      1006.1208       1462.271

Ideally, I would like to have the 0.5 quantile as a black line and the others as shaded color intervals surrounding the black line. What's the best way to do this? I've been looking around with no luck, I can't find examples of this, even less with ggplot2.

Any help would be appreciated.

Salud!

like image 867
jla Avatar asked Jun 14 '11 07:06

jla


2 Answers

Does this do what you want? The trick to ggplot is understanding that it expects data in long format. This often means that we have to transform the data before it is ready to plot, usually with melt().

After reading your data in with textConnection() and creating an object named dat, here are the steps you'd take:

#Melt into long format 
dat.m <- melt(dat, id.vars = "time")

#Not necessary, but if you want different line types depending on quantile, here's how I'd do it
dat.m <- within(dat.m
  , lty <- ifelse(variable == "quantile.0.5", 1
    , ifelse(variable %in% c("quantile.0.25", "quantile.0.75"),2,3)
    )
)

#plot it
ggplot(dat.m, aes(time, value, group = variable, colour = variable, linetype = lty)) + 
  geom_line() +
  scale_colour_manual(name = "", values = c("red", "blue", "black", "blue", "red"))

Gives you:

enter image description here

After reading your question again, maybe you want shaded ribbons outside the median estimate instead of lines? If so, give this a whirl. The only real trick here is that we pass group = 1 as an aesthetic so that geom_line() will behave properly with factor / character data. Previously, we grouped by the variable which served the same effect. Also note that we are no longer using the melted data.frame, as the wide data.frame will suit us just fine in this case.

ggplot(dat, aes(x = time, group = 1)) +
  geom_ribbon(aes(ymin = quantile.0.05, ymax = quantile.0.95, fill = "05%-95%"), alpha = .25) + 
  geom_ribbon(aes(ymin = quantile.0.25, ymax = quantile.0.75, fill = "25%-75%"), alpha = .25) +
  geom_line(aes(y = quantile.0.5)) +
  scale_fill_manual(name = "", values = c("25%-75%" = "red", "05%-95%" = "blue")) 

enter image description here

Edit: To force a legend for the predicted value

We can use the same approach we used for the geom_ribbon() layers. We'll add an aesthetic to geom_line() and then set the values of that aesthetic with scale_colour_manual():

ggplot(dat, aes(x = time, group = 1)) +
  geom_ribbon(aes(ymin = quantile.0.05, ymax = quantile.0.95, fill = "05%-95%"), alpha = .25) + 
  geom_ribbon(aes(ymin = quantile.0.25, ymax = quantile.0.75, fill = "25%-75%"), alpha = .25) +
  geom_line(aes(y = quantile.0.5, colour = "Predicted")) +
  scale_fill_manual(name = "", values = c("25%-75%" = "red", "05%-95%" = "blue")) +
  scale_colour_manual(name = "", values = c("Predicted" = "black"))

There may be more efficient ways to do that, but that's the way I've always used and have had pretty good success with it. YMMV.

like image 101
Chase Avatar answered Sep 19 '22 19:09

Chase


Assuming your dat.frame is called df:

The easiest ggplot solution is to use the boxplot geom. This gives a black central line with filled boxes to the middle and upper positions.

Since you have pre-summarised your data, it is important to specify the stat="identity" parameter:

ggplot(df, aes(x=time)) + 
    geom_boxplot(
        aes(
          lower=quantile.0.25, 
          upper=quantile.0.75,
          middle=quantile.0.5,
          ymin=quantile.0.05,
          ymax=quantile.0.95
        ), 
        stat="identity",
        fill = "cyan"
)

enter image description here

PS. I recreated your data as follows:

df <- "time           quantile=0.05  quantile=0.25 quantile=0.5  quantile=0.75   quantile=0.95
00:01          623.0725       630.4353      903.8870       959.1407       1327.721
00:02          623.0944       631.3707      911.9967      1337.4564       1518.539
00:03          623.0725       630.4353      903.8870      1170.8316       1431.893
00:04          623.0725       630.4353      903.8870      1336.3212       1431.893
00:05          623.0835       631.3557      905.4220      1079.6623       1452.260
00:06          623.0835       631.3557      905.4220      1079.6623       1452.260
00:07          623.0835       631.3557      905.4220      1079.6623       1452.260
00:08          623.0780       631.3483      905.3496      1056.3719       1375.610
00:09          623.0671       630.4275      903.8839      1170.8196       1356.963
00:10          623.0507       630.0261      741.8475      1006.1208       1462.271"

df <- read.table(textConnection(df), header=TRUE)
like image 5
Andrie Avatar answered Sep 23 '22 19:09

Andrie