I need to merge some data from one data.table into another. I know how to add a new column from one data.table to another in a join. Below values from column b in df2 is added to df1 in a join based on the Id column:
df1 <- data.table(Id = letters[1:5], a = 1:5)
df2 <- data.table(Id = letters[1:3], a = 7:9, b = 7:9)
setkey(df1, Id)
setkey(df2, Id)
df1[df2, b := b][]
#> Id a b
#> 1: a 1 7
#> 2: b 2 8
#> 3: c 3 9
#> 4: d 4 NA
#> 5: e 5 NA
However, that idiom does not work when the column already exists in df1, here column a:
df1[df2, a := a][]
#> Id a
#> 1: a 1
#> 2: b 2
#> 3: c 3
#> 4: d 4
#> 5: e 5
I understand that a is not updated by this assignment because the field a already exists in df1. The reference to a in the right hand side of the assignment resolves to that value, not the on in df2.
So how to update values in df1$a with those in df2$a in a join on matching id to get the following:
#> Id a
#> 1: a 7
#> 2: b 8
#> 3: c 9
#> 4: d 4
#> 5: e 5
From ?data.table:
When
iis a data.table, the columns ofican be referred to injby using the prefixi., e.g.,X[Y, .(val, i.val)]. Herevalrefers toX's column andi.valY's.
Thus, in the RHS of :=, use the i. prefix to refer to the a column in df2, i.a:
library(data.table)
df1 <- data.table(Id = letters[1:5], a = 1:5)
df2 <- data.table(Id = letters[1:3], a = 7:9, b = 7:9)
setkey(df1, Id)
setkey(df2, Id)
df1[df2, a := i.a]
# or instead of setting keys, use `on` argument:
df1[df2, on = .(Id), a := i.a]
df1
# Id a
# <char> <int>
# 1: a 7
# 2: b 8
# 3: c 9
# 4: d 4
# 5: e 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With