Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging tables with different column names

If I understand correctly, by default data.table merges two tables by comparing columns which are set as keys and have same names. How should I write if I have tables with different column names? For example:

set.seed(123)
DT1<-data.table(col1=sample(letters,5,replace=TRUE),col2=sample(LETTERS[1:5],5,replace=TRUE),col3=sample(1:2,5,replace=TRUE))
DT2<-data.table(col4=sample(1:3,10,replace=TRUE),col5=sample(LETTERS[1:5],10,replace=TRUE),col6=sample(1:100,10,replace=TRUE))
(DT1)
(DT2)

> (DT1)
   col1 col2 col3
1:    h    A    2
2:    u    C    1
3:    k    E    2
4:    w    C    2
5:    y    C    1
> (DT2)
    col4 col5 col6
 1:    3    D   48
 2:    1    C   76
 3:    1    C   22
 4:    1    B   32
 5:    3    A   24
 6:    3    E   15
 7:    3    E   42
 8:    2    D   42
 9:    3    D   37
10:    2    A   16

What should I write in DT1[ to make a merge selecting only rows with col2==col5 & col3==col4? Here is an expected output:

   col1 col2 col3 col4 col5 col6
    h    A    2    2    A   16
    u    C    1    1    C   76
    u    C    1    1    C   22
    y    C    1    1    C   76
    y    C    1    1    C   22

Thanks in advance!

like image 277
Vasily A Avatar asked Apr 16 '13 21:04

Vasily A


1 Answers

Using data.table's subset based joins along with the recently implemented on= argument and nomatch=0L, this is simply:

DT2[DT1, on=c(col5="col2", col4="col3"), nomatch=0L]

See the secondary indices vignette for more.


Alternatively if you've the data.tables keyed, then you can skip the on= argument. But the solution above would be idiomatic as it retains the order of original data.tables, and it is clear to tell what columns are being looked up by looking at the code.

setkey(DT1, col2, col3)
setkey(DT2, col5, col4)
DT2[DT1, nomatch=0L]

See history for older versions.

like image 147
Arun Avatar answered Oct 05 '22 11:10

Arun