Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2: plotting non-contiguous time durations as a bar chart

Tags:

r

ggplot2

I'm using ggplot to plot various events as a function of the date (x-axis) and start time (y-axis) on which they began. The data/code are as follows:

date<-c("2013-06-05","2013-06-05","2013-06-04","2013-06-04","2013-06-04","2013-06-04","2013-06-04",
    "2013-06-04","2013-06-04","2013-06-03","2013-06-03","2013-06-03","2013-06-03","2013-06-03",
    "2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02")
start <-c("07:36:00","01:30:00","22:19:00","22:12:00","20:16:00","19:19:00","09:00:00",
     "06:45:00","01:03:00","22:15:00","19:05:00","08:59:00","08:01:00","07:08:00",
     "23:24:00","20:39:00","18:53:00","16:57:00","15:07:00","14:33:00","13:24:00")
duration <-c(0.5,6.1,2.18,0.12,1.93,0.95,10.32,
         2.25,5.7,2.78,3.17,9.03,0.95,0.88,
         7.73,2.75,1.77,1.92,1.83,0.57,1.13)
event <-c("AF201","SS431","BE201","CD331","HG511","CD331","WQ115",
      "CD331","SS431","WQ115","HG511","WQ115","CD331","AF201",
      "SS431","WQ115","HG511","WQ115","CD331","AS335","CD331")

df<-data.frame(date,start,duration,event)

library(ggplot2)
library(scales)

p <- ggplot(df, aes(as.Date(date),as.POSIXct(start,format='%H:%M:%S'),color=event))
p <- p+geom_point(alpha = I(6/10),size=5) 
p + ylab("time (hr)") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))  

The resulting plot looks like this:

enter image description here

Question: Instead of simply indicating the start time of the event with a single point (shown above), how can I plot a bar that spans the time duration of the event? As shown in the data frame above I have this duration data (in hours). Alternatively, I could supply a 'stop time' (not shown).

I'm imagining the solution would look something like a stacked bar chart. However, a bar chart isn't quite right as it assumes the bar starts at the bottom of the plot and that the vertically stacked events have no gaps between them. My events may be non-contiguous -- 'starting' and 'stopping' at various positions along the y-axis. The solution will also have to take into consideration that 1) some events may ultimately be concurrent (overlap in time) and 2) some events will span multiple days.

I'd be very grateful for any suggestions!

like image 426
Kappa Avatar asked Jun 15 '13 06:06

Kappa


2 Answers

It's a bit unclear exactly what you want - @Michele's answer seemed good, I wasn't clear if you wanted to to use geom_rect because it would make for thicker lines (if so, just change the line width), or if there was another reason. I decided to give it a go using geom_rect to enable dodging. I've plotted it with the starting date on the x axis, and the start and end times on y. I've set up the data slightly differently to enable that. If you're after something different, try to make it explicit, but at least here's another option:

df<-data.frame(date,start,duration,event)

df <- transform(df,
                start = as.POSIXct(paste(date, start)),
                end   = as.POSIXct(paste(date, start)) + duration*3600)

df <- df[c("event", "start", "end")]

df$date <- strptime(df$start, "%Y-%m-%d")
df$start.new <- format(df$start, format = "%H:%M:%S")
df$end.new <- format(df$end, format = "%H:%M:%S")
df$day <- factor(as.POSIXct(df$date))
levels(df$day) <- 1:4
df$day <- as.numeric(as.character(df$day))
df$event.int <- df$event
levels(df$event.int) <- 1:7
df$event.int <- as.numeric(as.character(df$event.int))

p <- ggplot(df, aes(day, start)) + geom_rect(aes(ymin = start, ymax = end,
                                            xmin = (day - 0.45) + event.int/10,
                                            xmax = (day - 0.35) + event.int/10,
                                            fill = event)) +
  scale_x_discrete(limits = 1:4,breaks = 1:4, labels = sort(unique(date)),
                   name = "Start date") + ylab("Duration")

enter image description here

like image 137
alexwhan Avatar answered Sep 22 '22 01:09

alexwhan


Thanks (+1s) to @Michele and @alexwhan for your input. Using geom_rect I was able to get all of the events which occur on the same date on the same point on the x axis. (I'm anticipating that this data set may ultimately include many months of events.)

df<-data.frame(date,start,duration,event)

library(ggplot2)

p <- ggplot(df, aes(xmin=as.Date(date),xmax=as.Date(date)+1,
                    ymin=as.POSIXct(start,format='%H:%M:%S'),
                    ymax=as.POSIXct(start,format='%H:%M:%S')+duration*3600,
                    fill=event))
p <- p+geom_rect(alpha = I(8/10)) 
p + ylab("time") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))   

... resulting in this: enter image description here

This is pretty close to what I was aiming for. I think I can deal with the potential overplotting issue by adjusting the alpha. Ideally I'd like the y axis to include just a single day (00 to 00). To do this I guess I'll probably need to reformat the data such that events with durations that extend beyond midnight are reallocated to the next day. (Not sure how to do this in R.)

like image 36
Kappa Avatar answered Sep 20 '22 01:09

Kappa