Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling joins: roll forwards and backwards

Tags:

r

data.table

data.table is awesome, because I can do rolling joins, and even do rolling joins within groups!

library(data.table)
set.seed(42)
metrics <- data.frame(
  ID=c(rep(1, 10), rep(2,5), rep(3,5)),
  Time=c(1:10, 4:8, 8:12),
  val1=runif(20),
  val2=runif(20),
  val3=runif(20),
  val4=runif(20)
  )
metrics <- data.table(metrics[sample(1:nrow(metrics), 15),], key=c('ID', 'Time'))
calendar <- data.table(expand.grid(ID=1:3, Time=1:12), key=c('ID', 'Time'))

metrics[calendar,roll=TRUE]

However, this isn't awesome enough for me. This data.table still has NAs:

> metrics[calendar,roll=TRUE]
    ID Time      val1      val2      val3        val4
 1:  1    1 0.9148060 0.9040314 0.3795592 0.675607275
 2:  1    2 0.9370754 0.1387102 0.4357716 0.982817198
 3:  1    3 0.9370754 0.1387102 0.4357716 0.982817198
 4:  1    4 0.8304476 0.9466682 0.9735399 0.566488424
 5:  1    5 0.8304476 0.9466682 0.9735399 0.566488424
 6:  1    6 0.5190959 0.5142118 0.9575766 0.189473935
 7:  1    7 0.7365883 0.3902035 0.8877549 0.271286615
 8:  1    8 0.7365883 0.3902035 0.8877549 0.271286615
 9:  1    9 0.6569923 0.4469696 0.9709666 0.693204820
10:  1   10 0.7050648 0.8360043 0.6188382 0.240544740
11:  1   11 0.7050648 0.8360043 0.6188382 0.240544740
12:  1   12 0.7050648 0.8360043 0.6188382 0.240544740
13:  2    1        NA        NA        NA          NA
14:  2    2        NA        NA        NA          NA
15:  2    3        NA        NA        NA          NA
16:  2    4 0.4577418 0.7375956 0.3334272 0.042988796
17:  2    5 0.7191123 0.8110551 0.3467482 0.140479094
18:  2    6 0.9346722 0.3881083 0.3984854 0.216385415
19:  2    7 0.2554288 0.6851697 0.7846928 0.479398564
20:  2    8 0.2554288 0.6851697 0.7846928 0.479398564
21:  2    9 0.2554288 0.6851697 0.7846928 0.479398564
22:  2   10 0.2554288 0.6851697 0.7846928 0.479398564
23:  2   11 0.2554288 0.6851697 0.7846928 0.479398564
24:  2   12 0.2554288 0.6851697 0.7846928 0.479398564
25:  3    1        NA        NA        NA          NA
26:  3    2        NA        NA        NA          NA
27:  3    3        NA        NA        NA          NA
28:  3    4        NA        NA        NA          NA
29:  3    5        NA        NA        NA          NA
30:  3    6        NA        NA        NA          NA
31:  3    7        NA        NA        NA          NA
32:  3    8 0.9400145 0.8329161 0.7487954 0.719355838
33:  3    9 0.9400145 0.8329161 0.7487954 0.719355838
34:  3   10 0.1174874 0.2076590 0.1712643 0.375489965
35:  3   11 0.4749971 0.9066014 0.2610880 0.514407708
36:  3   12 0.5603327 0.6117786 0.5144129 0.001570554
    ID Time      val1      val2      val3        val4

I could fill these NA's using zoo:::na.locf, fromLast=TRUE, but that's not very fun. Can anyone think of an elegant way I can roll NA's backward, (after rolling them forward), during the data.table join?

like image 441
Zach Avatar asked Apr 08 '13 22:04

Zach


2 Answers

This is possible in data.table version 1.8.8 released March 2013:

metrics[calendar, roll=TRUE, rollends=c(TRUE, TRUE)]

From the data.table NEWS file:

In addition to TRUE/FALSE, 'roll' may now be a positive number (roll forwards/LOCF) or negative number (roll backwards/NOCB). A finite number limits the distance a value is rolled (limited staleness). roll=TRUE and roll=+Inf are equivalent. 'rollends' is a new parameter holding two logicals. The first observation is rolled backwards if the first value of rollends is TRUE. The last observation is rolled forwards if the second value of rollends is TRUE. If roll is a finite number, the same limit applies to the ends. New value roll='nearest' joins to the nearest value (either backwards or forwards) when the value falls in a gap, and to the end value according to 'rollends'. 'rolltolast' has been deprecated. For backwards compatibility it is converted to {roll=TRUE;rollends=c(FALSE,FALSE)}.

As always, to download the most up-to-date version of data.table, see Installation.

like image 190
Josh O'Brien Avatar answered Nov 20 '22 07:11

Josh O'Brien


metrics[calendar, roll = TRUE, rollends = c(TRUE, TRUE)]

like image 44
eddi Avatar answered Nov 20 '22 05:11

eddi