Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: transform irregular time strings

I have two different time series from different data frames with different irregular formats, but the problem is the same. I want to extract only hours, minutes, seconds and milliseconds.

The times series look like this:

ts1

08:27:23,445
08:27:24,280
08:27:25,115
...

I tried

strptime("08:27:23,445", "%H:%M:%OS")
[1] "2013-05-23 08:27:23"

I lost the millisecond-information and get the useless (for me) date information.

ts2

Fri Apr 19 2013 08:39:41 GMT+0200
Fri Apr 19 2013 08:39:43 GMT+0200
Fri Apr 19 2013 08:39:45 GMT+0200
...

I tried

strptime("Fri Apr 19 2013 08:39:41 GMT+0200", "%a %b %d %Y %H:%M:%S %Z")
[1] NA

In the end, I want to transform ts1 and ts2 each into a new time series that have the same format (with milliseconds), for example:

ts1

08:27:23,445

ts2

08:39:41,000

The same format is important for me, because I want to operate with the two time series later on. E.g.: Matching the time series, calculate differences, etc...

Thank you for your help!

UPDATE: add dput

Both datasets are very very long , thats why i tried to cut them down.

ts1

structure(list(t = structure(1:9, .Label = c("08:27:23,445", 
                                                   "08:27:24,280", "08:27:25,115", "08:27:25,960", "08:27:26,780", 
                                                   "08:27:27,540", "08:27:28,295", "08:27:29,075", "08:27:29,910"), class = "factor")), .Names = "t", row.names = c(NA, -9L
                                                   ), class = "data.frame")

ts2

structure(list(t = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 7L, 
                           8L), .Label = c("Fri Apr 19 2013 08:39:41 GMT+0200", "Fri Apr 19 2013 08:39:43 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:45 GMT+0200", "Fri Apr 19 2013 08:39:49 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:51 GMT+0200", "Fri Apr 19 2013 08:39:53 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:59 GMT+0200", "Fri Apr 19 2013 08:40:05 GMT+0200", 
                                           "Fri Apr 19 2013 08:40:06 GMT+0200"
                           ), class = "factor")), .Names = "t", row.names = c(NA, -9L), class = "data.frame")
like image 573
Markus Germar Avatar asked Nov 13 '22 04:11

Markus Germar


1 Answers

Below is a quick lapply function that might help, IF you have a set-zero point. For example, if you only want to compare activities from 0:00 (midnight) until 23:59:99,999 on the same day. If so, you can convert the time into another form (minutes in my example) and you can see how long, say, a single activity takes.

Using your example for t1:

Make a vector of times (as characters)

time <- c("08:27:23,445",
          "08:27:24,280",
          "08:27:25,115")

Change the comma to a colon, for ease of stripping

time.new <- gsub(",", ":", time)

Compute decimal minutes

time.mins <- sapply(strsplit(as.character(time.new), ":"),
                    function(x) {
                      x<-as.numeric(x)
                      (x[1]*60+x[2]+(x[3]/60)+(x[4]/60000))
                    })

The results looks like this, if you make a df of the columns:

> df <- cbind(time, time.mins)
> df
     time           time.mins         
[1,] "08:27:23,445" "507.39075"       
[2,] "08:27:24,280" "507.404666666667"
[3,] "08:27:25,115" "507.418583333333"

I imagine this might be a bit more helpful for something like click-through rates, or when you don't ever care about a total gap of more than 24 hours.

like image 59
mCorey Avatar answered Nov 15 '22 06:11

mCorey