When assigning by reference with a data.table
using a column from a second data.table
, the results are inconsistent. When there are no matches by the key columns of both data.table
s, it appears the assigment expression y := y
is totally ignored - not even NA
s are returned.
library(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
print(dt1[dt2, y := y])
## id x # Would have also expected column: y
## 1: 1 3 # NA
## 2: 2 4 # NA
However, when there is a partial match, non-matching columns have a placeholder NA
.
dt2[, id := 2:3]
print(dt1[dt2, y := y])
## id x y
## 1: 1 3 NA # <-- placeholder NA here
## 2: 2 4 5
This wreaks havoc on later code that assumes a y
column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.
Is there an elegant way around this inconsistency?
With this recent commit, this issue, #759, is now fixed in v1.9.7. It works as expected when nomatch=NA
(the current default).
require(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
dt1[dt2, y := y][]
# id x y
# 1: 1 3 NA
# 2: 2 4 NA
Using merge works:
> dt3 <- merge(dt1, dt2, by='id', all.x=TRUE)
> dt3
id x y
1: 1 3 NA
2: 2 4 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With