If I understand correctly, by default data.table
merges two tables by comparing columns which are set as keys and have same names. How should I write if I have tables with different column names? For example:
set.seed(123)
DT1<-data.table(col1=sample(letters,5,replace=TRUE),col2=sample(LETTERS[1:5],5,replace=TRUE),col3=sample(1:2,5,replace=TRUE))
DT2<-data.table(col4=sample(1:3,10,replace=TRUE),col5=sample(LETTERS[1:5],10,replace=TRUE),col6=sample(1:100,10,replace=TRUE))
(DT1)
(DT2)
> (DT1)
col1 col2 col3
1: h A 2
2: u C 1
3: k E 2
4: w C 2
5: y C 1
> (DT2)
col4 col5 col6
1: 3 D 48
2: 1 C 76
3: 1 C 22
4: 1 B 32
5: 3 A 24
6: 3 E 15
7: 3 E 42
8: 2 D 42
9: 3 D 37
10: 2 A 16
What should I write in DT1[
to make a merge selecting only rows with col2==col5 & col3==col4?
Here is an expected output:
col1 col2 col3 col4 col5 col6
h A 2 2 A 16
u C 1 1 C 76
u C 1 1 C 22
y C 1 1 C 76
y C 1 1 C 22
Thanks in advance!
Using data.table's subset based joins along with the recently implemented on=
argument and nomatch=0L
, this is simply:
DT2[DT1, on=c(col5="col2", col4="col3"), nomatch=0L]
See the secondary indices vignette for more.
Alternatively if you've the data.tables keyed, then you can skip the on=
argument. But the solution above would be idiomatic as it retains the order of original data.tables, and it is clear to tell what columns are being looked up by looking at the code.
setkey(DT1, col2, col3)
setkey(DT2, col5, col4)
DT2[DT1, nomatch=0L]
See history for older versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With