Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent data.table assignment by reference behaviour

Tags:

r

data.table

When assigning by reference with a data.table using a column from a second data.table, the results are inconsistent. When there are no matches by the key columns of both data.tables, it appears the assigment expression y := y is totally ignored - not even NAs are returned.

library(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
print(dt1[dt2, y := y])
##    id x     # Would have also expected column:   y
## 1:  1 3     #                                   NA
## 2:  2 4     #                                   NA

However, when there is a partial match, non-matching columns have a placeholder NA.

dt2[, id := 2:3]
print(dt1[dt2, y := y])
##    id x  y
## 1:  1 3 NA    # <-- placeholder NA here
## 2:  2 4  5

This wreaks havoc on later code that assumes a y column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.

Is there an elegant way around this inconsistency?

like image 760
mchen Avatar asked Aug 05 '14 17:08

mchen


2 Answers

With this recent commit, this issue, #759, is now fixed in v1.9.7. It works as expected when nomatch=NA (the current default).

require(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
dt1[dt2, y := y][]
#    id x  y
# 1:  1 3 NA
# 2:  2 4 NA
like image 163
Arun Avatar answered Nov 07 '22 14:11

Arun


Using merge works:

> dt3 <- merge(dt1, dt2, by='id', all.x=TRUE)
> dt3
   id x  y
1:  1 3 NA
2:  2 4 NA
like image 25
Tchotchke Avatar answered Nov 07 '22 14:11

Tchotchke