Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using roll=TRUE with allow.cartesian=TRUE

What is the best way to do a Cartesian join and use the roll forward feature, but applying the roll feature to each alternative series from the joining table, rather than the whole series.

Best explained with an example:

library(data.table)
A = data.table(x = c(1,2,3,4,5), y = letters[1:5])
B = data.table(x = c(1,2,3,1,4), f = c("Alice","Alice","Alice", "Bob","Bob"), z = 101:105)
setkey(B,x)
C = B[A, roll = TRUE, allow.cartesian=TRUE, rollends = FALSE]

A
B
C[f == "Alice"]
C[f == "Bob"]
C

So we have the two starting tables:

> A
   x y
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
> B
   x     f   z
1: 1 Alice 101
2: 1   Bob 104
3: 2 Alice 102
4: 3 Alice 103
5: 4   Bob 105

And I want to join these so that I have for each x value in A I have both and Alice and Bob row, rolling forwards if either are missing (but not rolling past the end). This doesn't quite work as I've currently got it:

> C[f == "Alice"]
   x     f   z y
1: 1 Alice 101 a
2: 2 Alice 102 b
3: 3 Alice 103 c
> C[f == "Bob"]
   x   f   z y
1: 1 Bob 104 a
2: 4 Bob 105 d
> C
   x     f   z y
1: 1 Alice 101 a
2: 1   Bob 104 a
3: 2 Alice 102 b
4: 3 Alice 103 c
5: 4   Bob 105 d
6: 5    NA  NA e

Because Alice is there for 2 and 3, it doesn't roll Bob's data forwards. I need the extra rows for Bob so I want to get:

> C[f == "Alice"]
   x     f   z y
1: 1 Alice 101 a
2: 2 Alice 102 b
3: 3 Alice 103 c
> C[f == "Bob"]
   x   f   z y
1: 1 Bob 104 a
2: 2 Bob 104 b  # THESE ROWS ARE MISSING
3: 3 Bob 104 c  # THESE ROWS ARE MISSING
4: 4 Bob 105 d
> C
   x     f   z y
1: 1 Alice 101 a
2: 1   Bob 104 a
3: 2 Alice 102 b
4: 2   Bob 104 b  # THESE ROWS ARE MISSING
5: 3 Alice 103 c
6: 3   Bob 104 c  # THESE ROWS ARE MISSING
7: 4   Bob 105 d
8: 5    NA  NA e
like image 568
Corvus Avatar asked Apr 04 '14 12:04

Corvus


1 Answers

Here you go:

setkey(B, f, x)

setkey(B[CJ(unique(f), unique(x)), allow.cartesian = T,
         roll = T, rollends = c(F,F)], x)[A, allow.cartesian = T]
#   x     f   z y
#1: 1 Alice 101 a
#2: 1   Bob 104 a
#3: 2 Alice 102 b
#4: 2   Bob 104 b
#5: 3 Alice 103 c
#6: 3   Bob 104 c
#7: 4 Alice  NA d
#8: 4   Bob 105 d
#9: 5    NA  NA e

And you can filter out the NA's to suit your needs.

like image 128
eddi Avatar answered Nov 15 '22 05:11

eddi