Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

three data.table merge behavior inconsistency

Tags:

r

data.table

I've been searching around this morning to try to figure out if the failure below is expected but haven't found anything. Could anyone help point me to a related discussion? Otherwise, I might submit as an issue. Appreciate it.

library(data.table)

x <- data.table( a = 1:3 )
y <- data.table( a = 2:4 )
z <- data.table( a = 3:5 )

# works
merge( x , y )
# works
merge( y , z )

# fails
merge( x , merge( y , z ) )
# Error in merge.data.table(x, merge(y, z)) :
#   A non-empty vector of column names for `by` is required.

# works
merge( merge( x , y ) , z )
like image 896
Anthony Damico Avatar asked Oct 21 '20 13:10

Anthony Damico


Video Answer


1 Answers

This is a clear bug. Please report it. Luckily, it should be easy to fix.

merge.data.table contains this code:

if (is.null(by)) 
  by = intersect(key(x), key(y))
if (is.null(by)) 
  by = key(x)
if (is.null(by)) 
  by = intersect(names(x), names(y))

Now, the issue is that y is keyed (because merge.data.table sets a key):

x <- data.table( a = 1:3 )
y <- merge(data.table( a = 2:4 ), data.table( a = 3:5 ))
haskey(y)
#[1] TRUE

Then,

intersect(key(x), key(y))
#character(0)

Thus, none of the following if conditions is TRUE (we would want the third one to apply here).

This doesn't happen in your last case because of this:

intersect("foo", NULL)
#NULL
intersect(NULL, "foo")
#character(0)
like image 91
Roland Avatar answered Oct 27 '22 03:10

Roland