Let's say I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A,B,C,E,F,G,H are column names. dt1 key is column A, and dt2 key is column E. Data tables have different number of rows. I want to keep all the columns from DT1, and add only one column (H) from DT2 to the joined data table. Eventually, I will store this as DT1 (though I showed it as dt3 below).
How can I achieve it with data tables? I have an ugly solution with merge + data frames.
dt1
A B C
1 4 7
2 5 8
3 6 9
2 20 21
dt2
E F G H
1 10 13 16
3 12 15 18
2 11 14 17
dt3
A B C H
1 4 7 16
2 5 8 17
3 6 9 18
2 20 21 17
In order to perform a left join to df1
and add H
column from df2
, you can combine binary join with the update by reference operator (:=
)
setkey(setDT(dt1), A)
dt1[dt2, H := i.H]
See here and here for detailed explanation on how it works
With the devel version (v >= 1.9.5) we could make it even shorter by specifying the key
within setDT
(as pointed by @Arun)
setDT(dt1, key = "A")[dt2, H := i.H]
Edit 24/7/2015
You can now run a binary join using the new on
parameter without setting keys
setDT(dt1)[dt2, H := i.H, on = c(A = "E")]
data.table
solution
setDT(dt1)[ , H := dt2$H[match(dt1$A , dt2$E)] , ]
# A B C H
# 1: 1 4 7 16
# 2: 2 5 8 17
# 3: 3 6 9 18
# 4: 2 20 21 17
another dplyr
solution will be
left_join(x = dt1 , y = dt2 , by = c("A" = "E")) %>%
select(one_of(c("A" , "B" , "C" , "H")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With