Say I have a data table:
dt <- data.table(
datetime = seq(as.POSIXct("2016-01-01 00:00:00"),as.POSIXct("2016-01-01 10:00:00"), by = "1 hour"),
ObType = c("A","A","B","B","B","B","A","A","B","A","A")
)
dt
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 02:00:00 B
4: 2016-01-01 03:00:00 B
5: 2016-01-01 04:00:00 B
6: 2016-01-01 05:00:00 B
7: 2016-01-01 06:00:00 A
8: 2016-01-01 07:00:00 A
9: 2016-01-01 08:00:00 B
10: 2016-01-01 09:00:00 A
11: 2016-01-01 10:00:00 A
What I need to do is wherever the ObType is "B", I need to find the time to the nearest ObType "A" on either side. So the result should look like (in hours):
datetime ObType timeLag timeLead
1: 2016-01-01 00:00:00 A NA NA
2: 2016-01-01 01:00:00 A NA NA
3: 2016-01-01 02:00:00 B 1 4
4: 2016-01-01 03:00:00 B 2 3
5: 2016-01-01 04:00:00 B 3 2
6: 2016-01-01 05:00:00 B 4 1
7: 2016-01-01 06:00:00 A NA NA
8: 2016-01-01 07:00:00 A NA NA
9: 2016-01-01 08:00:00 B 1 1
10: 2016-01-01 09:00:00 A NA NA
11: 2016-01-01 10:00:00 A NA NA
I usually use data.table, but non data.table solutions are also fine.
Thanks!
Lyss
dt$timelag = NA
dt$timelead = NA
A = split(dt, dt$ObType)$A
B = split(dt, dt$ObType)$B
A_time_up = sort(A$datetime)
A_time_dn = sort(A$datetime, decreasing = TRUE)
B$timelag = apply(B, 1, function(x)
A_time_up[which(x[1] < A_time_up)[1]]
)
B$timelead = apply(B, 1, function(x)
A_time_dn[which(x[1] > A_time_dn)[1]]
)
B$timelag = (B$timelag - as.numeric(B$datetime))/(3600)
B$timelead = (as.numeric(B$datetime) - B$timelead)/(3600)
rbind(A,B)
The approach I hinted at using roll=
:
X = dt[ObType=="A"]
X
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 06:00:00 A
4: 2016-01-01 07:00:00 A
5: 2016-01-01 09:00:00 A
6: 2016-01-01 10:00:00 A
dt[ObType=="B", Lag:=X[.SD,on="datetime",roll=Inf,i.datetime-x.datetime]]
dt[ObType=="B", Lead:=X[.SD,on="datetime",roll=-Inf,x.datetime-i.datetime]]
dt[ObType=="B", Nearest:=X[.SD,on="datetime",roll="nearest",x.datetime-i.datetime]]
dt
datetime ObType Lag Lead Nearest
1: 2016-01-01 00:00:00 A NA hours NA hours NA hours
2: 2016-01-01 01:00:00 A NA hours NA hours NA hours
3: 2016-01-01 02:00:00 B 1 hours 4 hours -1 hours
4: 2016-01-01 03:00:00 B 2 hours 3 hours -2 hours
5: 2016-01-01 04:00:00 B 3 hours 2 hours 2 hours
6: 2016-01-01 05:00:00 B 4 hours 1 hours 1 hours
7: 2016-01-01 06:00:00 A NA hours NA hours NA hours
8: 2016-01-01 07:00:00 A NA hours NA hours NA hours
9: 2016-01-01 08:00:00 B 1 hours 1 hours -1 hours
10: 2016-01-01 09:00:00 A NA hours NA hours NA hours
11: 2016-01-01 10:00:00 A NA hours NA hours NA hours
One advantage of roll=
is that you can apply a staleness limit just by changing the Inf
to the limit of time you wish to join within. It's the time difference that the limit applies to, not the number of rows. Inf
just means don't limit. The roll=
sign indicates whether to look forwards or backwards (lead or lag).
Another advantage is that roll=
is fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With