Find time to nearest occurrence of particular value for each row

Question

Say I have a data table:

dt <- data.table(
        datetime = seq(as.POSIXct("2016-01-01 00:00:00"),as.POSIXct("2016-01-01 10:00:00"), by = "1 hour"),
        ObType = c("A","A","B","B","B","B","A","A","B","A","A")
)

dt
                   datetime ObType
     1: 2016-01-01 00:00:00      A
     2: 2016-01-01 01:00:00      A
     3: 2016-01-01 02:00:00      B
     4: 2016-01-01 03:00:00      B
     5: 2016-01-01 04:00:00      B
     6: 2016-01-01 05:00:00      B
     7: 2016-01-01 06:00:00      A
     8: 2016-01-01 07:00:00      A
     9: 2016-01-01 08:00:00      B
    10: 2016-01-01 09:00:00      A
    11: 2016-01-01 10:00:00      A

What I need to do is wherever the ObType is "B", I need to find the time to the nearest ObType "A" on either side. So the result should look like (in hours):

               datetime ObType timeLag timeLead
 1: 2016-01-01 00:00:00      A      NA       NA
 2: 2016-01-01 01:00:00      A      NA       NA
 3: 2016-01-01 02:00:00      B       1        4
 4: 2016-01-01 03:00:00      B       2        3
 5: 2016-01-01 04:00:00      B       3        2
 6: 2016-01-01 05:00:00      B       4        1
 7: 2016-01-01 06:00:00      A      NA       NA
 8: 2016-01-01 07:00:00      A      NA       NA
 9: 2016-01-01 08:00:00      B       1        1
10: 2016-01-01 09:00:00      A      NA       NA
11: 2016-01-01 10:00:00      A      NA       NA

I usually use data.table, but non data.table solutions are also fine.

Thanks!

Lyss

d.b · Accepted Answer

dt$timelag = NA
dt$timelead = NA

A = split(dt, dt$ObType)$A
B = split(dt, dt$ObType)$B

A_time_up = sort(A$datetime)
A_time_dn = sort(A$datetime, decreasing = TRUE)

B$timelag = apply(B, 1, function(x) 
    A_time_up[which(x[1] < A_time_up)[1]]
)

B$timelead = apply(B, 1, function(x) 
    A_time_dn[which(x[1] > A_time_dn)[1]]
)

B$timelag = (B$timelag - as.numeric(B$datetime))/(3600)
B$timelead = (as.numeric(B$datetime) - B$timelead)/(3600)

rbind(A,B)

Matt Dowle · Answer

The approach I hinted at using roll= :

X = dt[ObType=="A"]
X
              datetime ObType
1: 2016-01-01 00:00:00      A
2: 2016-01-01 01:00:00      A
3: 2016-01-01 06:00:00      A
4: 2016-01-01 07:00:00      A
5: 2016-01-01 09:00:00      A
6: 2016-01-01 10:00:00      A

dt[ObType=="B", Lag:=X[.SD,on="datetime",roll=Inf,i.datetime-x.datetime]]
dt[ObType=="B", Lead:=X[.SD,on="datetime",roll=-Inf,x.datetime-i.datetime]]
dt[ObType=="B", Nearest:=X[.SD,on="datetime",roll="nearest",x.datetime-i.datetime]]
dt
               datetime ObType      Lag     Lead     Nearest
 1: 2016-01-01 00:00:00      A NA hours NA hours    NA hours
 2: 2016-01-01 01:00:00      A NA hours NA hours    NA hours
 3: 2016-01-01 02:00:00      B  1 hours  4 hours    -1 hours
 4: 2016-01-01 03:00:00      B  2 hours  3 hours    -2 hours
 5: 2016-01-01 04:00:00      B  3 hours  2 hours     2 hours
 6: 2016-01-01 05:00:00      B  4 hours  1 hours     1 hours
 7: 2016-01-01 06:00:00      A NA hours NA hours    NA hours
 8: 2016-01-01 07:00:00      A NA hours NA hours    NA hours
 9: 2016-01-01 08:00:00      B  1 hours  1 hours    -1 hours
10: 2016-01-01 09:00:00      A NA hours NA hours    NA hours
11: 2016-01-01 10:00:00      A NA hours NA hours    NA hours

One advantage of roll= is that you can apply a staleness limit just by changing the Inf to the limit of time you wish to join within. It's the time difference that the limit applies to, not the number of rows. Inf just means don't limit. The roll= sign indicates whether to look forwards or backwards (lead or lag).

Another advantage is that roll= is fast.

Find time to nearest occurrence of particular value for each row

Tags:

time

r

data.table

Bucket

2 Answers

d.b

Matt Dowle

Recent Activity

Donate For Us

Find time to nearest occurrence of particular value for each row

Tags:

time

r

data.table

Bucket

2 Answers

d.b

Matt Dowle

Related questions

Recent Activity

Donate For Us