Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with durations defined by days, hours, minutes and seconds such as "1d 3h 2m 28s" in R

Tags:

regex

time

r

I have a data frame with character vectors in the format with days, hours, minutes and seconds represented like "1d 3h 2m 28s":

> head(status[5])
    Duration 
1 0d 20h 46m 31s 
2  2d  0h 13m 54s
3  2d  0h 13m 53s
4  0d  9h 53m 38s
5  5d 12h 17m 37s
6  0d 10h 21m 19s

I can parse it with regex for the components but cannot come up with a good way to convert the duration into seconds. I can gsub the vectors into an expression that would result in the number of seconds but hit a road block with using eval on the results.

I could do something similar to what was recommended here but hoped to follow the regex route - even if it isn't the most efficient. I'm only dealing with parsing a variety of small HTML tables.

status$duration <- gsub("(\\d+)d\\s+(\\d+)h\\s+(\\d+)m\\s+(\\d+)s.*","\\1*86400+\\2*3600+\\3*60+\\4",as.character(status[,5]),perl=TRUE)

The above creates an expression that can be evaluated but I'm missing something when it comes to parse(text=status$duration) and a subsequent eval.

In perl, I'm accustomed to taking the "captured variables" in the regex expression and immediately using them rather than only within a replacement string. Are there similar possibilities in R?

Thank you, I'm probably missing something very simple due to fogginess of mind.

like image 223
jed Avatar asked Oct 18 '11 05:10

jed


1 Answers

You are almost there. The problem is that the eval function is not vectorised. This means you need to wrap each element of your results string into an apply statement to evaluate each element in turn.

First recreate your data:

status <- c("0d 20h 46m 31s", "2d 0h 13m 54s", "2d 0h 13m 53s", 
       "0d 9h 53m 38s", "5d 12h 17m 37s", "0d 10h 21m 19s")

duration <- gsub("(\\d+)d\\s+(\\d+)h\\s+(\\d+)m\\s+(\\d+)s.*","\\1*86400+\\2*3600+\\3*60+\\4",
                 as.character(status),perl=TRUE)
[1] "0*86400+20*3600+46*60+31" "2*86400+0*3600+13*60+54"  "2*86400+0*3600+13*60+53" 
[4] "0*86400+9*3600+53*60+38"  "5*86400+12*3600+17*60+37" "0*86400+10*3600+21*60+19"

To evaluate a single element:

eval(parse(text=duration[1]))
[1] 74791

Wrap this in sapply or your favourite apply statement to evaluate all of the strings:

sapply(duration, function(x)eval(parse(text=x)))

0*86400+20*3600+46*60+31  2*86400+0*3600+13*60+54 
                   74791                   173634 
 2*86400+0*3600+13*60+53  0*86400+9*3600+53*60+38 
                  173633                    35618 
5*86400+12*3600+17*60+37 0*86400+10*3600+21*60+19 
                  476257                    37279 
like image 83
Andrie Avatar answered Nov 14 '22 21:11

Andrie