Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one do a full join using data.table?

In the data.table FAQ, the nomatch = NA parameter is said to be akin to an outer join. However, I haven't been able to get data.table to do a full outer join – only right outer joins.

For example:

a <- data.table("dog" = c(8:12), "cat" = c(15:19))     dog cat 1:   8  15 2:   9  16 3:  10  17 4:  11  18 5:  12  19  b <- data.table("dog" = 1:10, "bullfrog" = 11:20)      dog bullfrog  1:   1       11  2:   2       12  3:   3       13  4:   4       14  5:   5       15  6:   6       16  7:   7       17  8:   8       18  9:   9       19 10:  10       20  setkey(a, dog) setkey(b, dog)  a[b, nomatch = NA]      dog cat bullfrog  1:   1  NA       11  2:   2  NA       12  3:   3  NA       13  4:   4  NA       14  5:   5  NA       15  6:   6  NA       16  7:   7  NA       17  8:   8  15       18  9:   9  16       19 10:  10  17       20 

So, nomatch = NA produces a right outer join (which is the default). What if I need a full join? For example:

merge(a, b, by = "dog", all = TRUE)  # Or with plyr: join(a, b, by = "dog", type = "full")      dog cat bullfrog  1:   1  NA       11  2:   2  NA       12  3:   3  NA       13  4:   4  NA       14  5:   5  NA       15  6:   6  NA       16  7:   7  NA       17  8:   8  15       18  9:   9  16       19 10:  10  17       20 11:  11  18       NA 12:  12  19       NA 

Is that possible with data.table?

like image 449
Paul Murray Avatar asked Mar 02 '13 04:03

Paul Murray


People also ask

When we use full join in SQL?

The SQL FULL JOIN command LEFT JOIN and RIGHT JOIN each return unmatched rows from one of the tables— FULL JOIN returns unmatched rows from both tables. It is commonly used in conjunction with aggregations to understand the amount of overlap between two tables.

How many rows are in a full join?

FULL OUTER JOIN will return 25 rows in result set. INNER JOIN will return matching rows, hence, 5 rows in result set.


2 Answers

You actually have it right there. Use merge.data.table which is exactly what you are doing when you call

merge(a, b, by = "dog", all = TRUE) 

since a is a data.table, merge(a, b, ...) calls merge.data.table(a, b, ...)

like image 198
Ricardo Saporta Avatar answered Sep 21 '22 21:09

Ricardo Saporta


x= data.table(a=1:5,b=11:15) y= data.table(a=c(1:4,6),c=c(101:104,106))  setkey(x,a) setkey(y,a)  unique_keys <- unique(c(x[,a], y[,a])) y[x[.(unique_keys), on="a"]  ] # Full Outer Join 
like image 42
Ashrith Reddy Avatar answered Sep 23 '22 21:09

Ashrith Reddy