I have a column of durations stored as a strings in a dataframe. I want to convert them to an appropriate time object, probably POSIXlt. Most of the strings are easy to parse using this method:
> data <- data.frame(time.string = c(
+ "1 d 2 h 3 m 4 s",
+ "10 d 20 h 30 m 40 s",
+ "--"))
> data$time.span <- strptime(data$time.string, "%j d %H h %M m %S s")
> data$time.span
[1] "2012-01-01 02:03:04" "2012-01-10 20:30:40" NA
Missing durations are coded "--"
and need to be converted to NA
- this already happens but should be preserved.
The challenge is that the string drops zero-valued elements. Thus the desired value 2012-01-01 02:00:14
would be the string "1 d 2 h 14 s"
. However this string parses to NA
with the simple parser:
> data2 <- data.frame(time.string = c(
+ "1 d 2 h 14 s",
+ "10 d 20 h 30 m 40 s",
+ "--"))
> data2$time.span <- strptime(data2$time.string, "%j d %H h %M m %S s")
> data2$time.span
[1] NA "2012-01-10 20:30:40" NA
2012-01-
) is troubling.@mplourde definitely had the right idea w/ dynamic creation of a formatting string based on testing various conditions in the date format. The addition of cut(Sys.Date(), breaks='years')
as the baseline for the datediff
was also good, but failed to account for a critical quirk in as.POSIXct()
Note: I'm using R2.11 base, this may have been fixed in later versions.
The output of as.POSIXct()
changes dramatically depending on whether or not a date component is included:
> x <- "1 d 1 h 14 m 1 s"
> y <- "1 h 14 m 1 s" # Same string, no date component
> format (x) # as specified below
[1] "%j d %H h %M m %S s"
> format (y)
[1] "% H h % M %S s"
> as.POSIXct(x,format=format) # Including the date baselines at year start
[1] "2012-01-01 01:14:01 EST"
> as.POSIXct(y,format=format) # Excluding the date baselines at today start
[1] "2012-06-26 01:14:01 EDT"
Thus the second argument for the difftime
function should be:
This can be accomplished by changing the unit parameter on the cut
function:
parse.time <- function (x) {
x <- as.character (x)
break.unit <- ifelse(grepl("d",x),"years","days") # chooses cut() unit
format <- paste(c(if (grepl("d", x)) "%j d",
if (grepl("h", x)) "%H h",
if (grepl("m", x)) "%M m",
if (grepl("s", x)) "%S s"), collapse=" ")
if (nchar(format) > 0) {
difftime(as.POSIXct(x, format=format),
cut(Sys.Date(), breaks=break.unit),
units="hours")
} else {NA}
}
difftime
objects are time duration objects that can be added to either POSIXct
or POSIXlt
objects. Maybe you want to use this instead of POSIXlt
?
Regarding the conversion from strings to time objects, you could do something like this:
data <- data.frame(time.string = c(
"1 d 1 h",
"30 m 10 s",
"1 d 2 h 3 m 4 s",
"2 h 3 m 4 s",
"10 d 20 h 30 m 40 s",
"--"))
f <- function(x) {
x <- as.character(x)
format <- paste(c(if (grepl('d', x)) '%j d',
if (grepl('h', x)) '%H h',
if (grepl('m', x)) '%M m',
if (grepl('s', x)) '%S s'), collapse=' ')
if (nchar(format) > 0) {
if (grepl('%j d', format)) {
# '%j 1' is day 0. We add a day so that x = '1 d' means 24hrs.
difftime(as.POSIXct(x, format=format) + as.difftime(1, units='days'),
cut(Sys.Date(), breaks='years'),
units='hours')
} else {
as.difftime(x, format, units='hours')
}
} else { NA }
}
data$time.span <- sapply(data$time.string, FUN=f)
I think you will have better luck with lubridate:
From Dates and Times Made Easy with lubridate:
5.3. Durations
...
The length of a duration is invariant to leap years, leap seconds, and daylight savings time because durations are measured in seconds. Hence, durations have consistent lengths and can be easily compared to other durations. Durations are the appropriate object to use when comparing time based attributes, such as speeds, rates, and lifetimes. lubridate uses the difftime class from base R for durations. Additional difftime methods have been created to facilitate this.
lubridate uses the difftime class from base R for durations. Additional difftime methods have been created to facilitate this.
...
Duration objects can be easily created with the helper functions dyears(), dweeks(), ddays(), dhours(), dminutes(), and dseconds(). The d in the title stands for duration and distinguishes these objects from period objects, which are discussed in Section 5.4. Each object creates a duration in seconds using the estimated relationships given above.
That said, I haven't (yet) found a function to parse a string into a duration.
You might also take a look at Ruby's Chronic to see how elegant time parsing can be. I haven't found a library like this for R.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With