Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple joins/merges with data.tables

I have two data.tables, DT and L:

> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9,key="x")
> L=data.table(yv=c(1L:8L,12L),lu=c(letters[8:1],letters[12]),key="yv")

> DT
   x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9

> L
   yv lu
1:  1  h
2:  2  g
3:  3  f
4:  4  e
5:  5  d
6:  6  c
7:  7  b
8:  8  a
9: 12  l

I would like to independently look up the corresponding value of lu from L for column y and for column v in DT. The following syntax provides the correct result, but is cumbersome to generate and then understand at a glance later:

> L[setkey(L[setkey(DT,y)],v)][,list(x,y=yv.1,v=yv,lu.1=lu.1,lu.2=lu)]
   x y v lu.1 lu.2
1: a 1 1    h    h
2: a 2 3    g    f
3: a 3 6    f    c
4: b 4 1    e    h
5: b 5 3    d    f
6: b 6 6    c    c
7: c 7 1    b    h
8: c 8 3    a    f
9: c 9 6   NA    c

(Edit: original post had L[setkey(L[setkey(DT,y)],v)][,list(x,y=yv,v=yv.1,lu.1=lu,lu.2=lu.1)] above, which incorrectly mixed up the y and v columns and looked up values.)

In SQL this would be simple/straightforward:

SELECT DT.*, L1.lu AS lu1, L2.lu AS lu2
FROM DT
LEFT JOIN L AS L1 ON DT.y = L1.yv
LEFT JOIN L AS L2 ON DT.v = L2.yv

Is there a more elegant way to use data.table to perform multiple joins? Note that I'm joining one table to another table twice in this example, but I am also interested in joining one table to multiple different tables.

like image 432
dnlbrky Avatar asked Jan 02 '13 17:01

dnlbrky


People also ask

Can we join 3 tables in R?

How to Join Multiple Data Frames in R?, you can find it useful to connect many data frames in R. Fortunately, the left join() function from the dplyr package makes this simple to accomplish. We can easily conduct two left joins, one after the other, to combine all three data frames.

How do I join two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.


1 Answers

Great question. One trick is that i doesn't have to be keyed. Only x must be keyed.

There might be better ways. How about this:

> cbind( L[DT[,list(y)]], L[DT[,list(v)]], DT )
   yv lu yv lu x y v
1:  1  h  1  h a 1 1
2:  3  f  2  g a 3 2
3:  6  c  3  f a 6 3
4:  1  h  4  e b 1 4
5:  3  f  5  d b 3 5
6:  6  c  6  c b 6 6
7:  1  h  7  b c 1 7
8:  3  f  8  a c 3 8
9:  6  c  9 NA c 6 9

or, to illustrate, this is the same :

> cbind( L[J(DT$y)], L[J(DT$v)], DT )
   yv lu yv lu x y v
1:  1  h  1  h a 1 1
2:  3  f  2  g a 3 2
3:  6  c  3  f a 6 3
4:  1  h  4  e b 1 4
5:  3  f  5  d b 3 5
6:  6  c  6  c b 6 6
7:  1  h  7  b c 1 7
8:  3  f  8  a c 3 8
9:  6  c  9 NA c 6 9

merge could also be used, if the following feature request was implemented :

FR#2033 Add by.x and by.y to merge.data.table

like image 122
Matt Dowle Avatar answered Sep 21 '22 16:09

Matt Dowle