I have two data.tables, DT and L:
> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9,key="x")
> L=data.table(yv=c(1L:8L,12L),lu=c(letters[8:1],letters[12]),key="yv")
> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
> L
yv lu
1: 1 h
2: 2 g
3: 3 f
4: 4 e
5: 5 d
6: 6 c
7: 7 b
8: 8 a
9: 12 l
I would like to independently look up the corresponding value of lu from L for column y and for column v in DT. The following syntax provides the correct result, but is cumbersome to generate and then understand at a glance later:
> L[setkey(L[setkey(DT,y)],v)][,list(x,y=yv.1,v=yv,lu.1=lu.1,lu.2=lu)]
x y v lu.1 lu.2
1: a 1 1 h h
2: a 2 3 g f
3: a 3 6 f c
4: b 4 1 e h
5: b 5 3 d f
6: b 6 6 c c
7: c 7 1 b h
8: c 8 3 a f
9: c 9 6 NA c
(Edit: original post had L[setkey(L[setkey(DT,y)],v)][,list(x,y=yv,v=yv.1,lu.1=lu,lu.2=lu.1)]
above, which incorrectly mixed up the y and v columns and looked up values.)
In SQL this would be simple/straightforward:
SELECT DT.*, L1.lu AS lu1, L2.lu AS lu2
FROM DT
LEFT JOIN L AS L1 ON DT.y = L1.yv
LEFT JOIN L AS L2 ON DT.v = L2.yv
Is there a more elegant way to use data.table to perform multiple joins? Note that I'm joining one table to another table twice in this example, but I am also interested in joining one table to multiple different tables.
How to Join Multiple Data Frames in R?, you can find it useful to connect many data frames in R. Fortunately, the left join() function from the dplyr package makes this simple to accomplish. We can easily conduct two left joins, one after the other, to combine all three data frames.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
Great question. One trick is that i
doesn't have to be keyed. Only x
must be keyed.
There might be better ways. How about this:
> cbind( L[DT[,list(y)]], L[DT[,list(v)]], DT )
yv lu yv lu x y v
1: 1 h 1 h a 1 1
2: 3 f 2 g a 3 2
3: 6 c 3 f a 6 3
4: 1 h 4 e b 1 4
5: 3 f 5 d b 3 5
6: 6 c 6 c b 6 6
7: 1 h 7 b c 1 7
8: 3 f 8 a c 3 8
9: 6 c 9 NA c 6 9
or, to illustrate, this is the same :
> cbind( L[J(DT$y)], L[J(DT$v)], DT )
yv lu yv lu x y v
1: 1 h 1 h a 1 1
2: 3 f 2 g a 3 2
3: 6 c 3 f a 6 3
4: 1 h 4 e b 1 4
5: 3 f 5 d b 3 5
6: 6 c 6 c b 6 6
7: 1 h 7 b c 1 7
8: 3 f 8 a c 3 8
9: 6 c 9 NA c 6 9
merge
could also be used, if the following feature request was implemented :
FR#2033 Add by.x and by.y to merge.data.table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With