Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot durations as lines with ggplot2

Last days, I've pondering the following problem:I want to plot the duration of an activity as a line, with the y axis representing the date the activity happens and the x axis representing time in hours. When I deal with an activity that starts and ends in the same day this is easy.

For example:

df1<-structure(list(Date = structure(c(16802, 16803, 16805, 16806, 
16809, 16810, 16812, 16813, 16816, 16820, 16821, 16822, 16829
), class = "Date"), hms1 = structure(c(1457623680, 1457620860, 
1457621160, 1457622540, 1457625600, 1457621280, 1457620380, 1457619720, 
1457623620, 1457621460, 1457620440, 1457617980, 1457621880), class =      c("POSIXct", "POSIXt"), tzone = ""), hms2 = structure(c(1457632500, 1457627640, 
1457628360, 1457629500, 1457631000, 1457625120, 1457625420, 1457624520, 
1457627640, 1457626800, 1457626800, 1457622060, 1457625540), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("Date", "hms1", "hms2"), class =     "data.frame", row.names = c(103L, 105L, 108L, 110L, 114L, 117L, 120L, 122L, 127L, 135L, 136L, 138L, 145L))

p1<-ggplot(df1, aes(x=Date,y= hms1))+ scale_x_date(breaks = date_breaks("1 day"))+
geom_linerange(aes(ymin = hms1, ymax = hms2),color = "red",size = 2)+ coord_flip()
p1+ylab("Time")+ggtitle("Activity During Day")

gives desired plot: ggplot1

Note that the real date, is on column Date while the dates in hms1m hms2 is wrong due to the process that I used to get the time in H:M:S format.

df1$hms1 <- format(df1$time, format = "%H:%M:%S")
df1$hms1 <- as.POSIXct(df1$hms1, format = "%H:%M:%S")

Thing is that most of the "activities", I'm trying to plot span in two day periods. The only solution I have come up with up to now, is shifting both the datetime values by an appropriate period, in order to "artificialy" place them in the same day, as suggested here 2 . This process though, naturally results in the x axis displaying the "artificial" timestamps.

How could I take control of the tick marks in ggplot, so I can make them display the right timestamps? Or even better can someone propose a more elegant solution? To be clear: I want the y-axis to be just the date of the start of the event, but the x-axis to span more than 24 hours.

like image 495
Stelios K. Avatar asked Mar 16 '16 15:03

Stelios K.


People also ask

How do you plot multiple lines on a graph in R?

In this method to create a ggplot with multiple lines, the user needs to first install and import the reshape2 package in the R console and call the melt() function with the required parameters to format the given data to long data form and then use the ggplot() function to plot the ggplot of the formatted data.

What is the difference between ggplot and ggplot2?

You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.


1 Answers

I'm not sure what your real data structure looks like, so I created some time periods for illustration as follows: I used your values for Date and then used runif to select some random start and end times for each activity. Some time periods occur within a single date and some span two dates.

I then plotted the data as follows: Plot horizontal lines that are located vertically at the date when the activity began. The line begins at the time (in hours) when the activity started (relative to Midnight on the date when the activity started), and then extends to the right for however many hours the activity lasted.

The horizontal scale extends to 48 hours, so that activities covering more than one date can be visualized. Vertical lines mark date boundaries, so you can see which activities continued into a second day. If you have activities lasting three or more days, just extend the horizontal scale as needed to accommodate them.

# Convert Date to POSIXct and set HMS to beginning of day (i.e., Midnight).
# To do this, I added 8-hours to account for my locale being the U.S. Pacific time zone.
df1$Date = as.POSIXct(df1$Date) + 3600*8

## Create some times for start and end of each activity
set.seed(20)
df1$hms1a = df1$Date  + runif(nrow(df1), 3600*5, 3600*10)
df1$hms2a = df1$Date + runif(nrow(df1), 3600*15, 3600*40)

library(ggplot2)
library(scales)
library(stringr)

ggplot(df1, aes(x=Date)) + 
  scale_x_datetime(breaks = date_breaks("1 day")) +
  scale_y_continuous(limits = c(0,48), breaks=seq(0,48,2), 
                     labels=str_pad(seq(0,48,2) %% 24, 2, pad="0")) +
  geom_hline(yintercept=seq(0,48,24)) +
  geom_linerange(aes(ymin = hms1a - Date, ymax = hms2a - Date), color = "red",size = 2) + 
  coord_flip() + ylab("Time (hours)") + 
  ggtitle("Activity During Day")

enter image description here

like image 141
eipi10 Avatar answered Sep 17 '22 11:09

eipi10