In the data.table
FAQ, the nomatch = NA
parameter is said to be akin to an outer join. However, I haven't been able to get data.table
to do a full outer join – only right outer joins.
For example:
a <- data.table("dog" = c(8:12), "cat" = c(15:19)) dog cat 1: 8 15 2: 9 16 3: 10 17 4: 11 18 5: 12 19 b <- data.table("dog" = 1:10, "bullfrog" = 11:20) dog bullfrog 1: 1 11 2: 2 12 3: 3 13 4: 4 14 5: 5 15 6: 6 16 7: 7 17 8: 8 18 9: 9 19 10: 10 20 setkey(a, dog) setkey(b, dog) a[b, nomatch = NA] dog cat bullfrog 1: 1 NA 11 2: 2 NA 12 3: 3 NA 13 4: 4 NA 14 5: 5 NA 15 6: 6 NA 16 7: 7 NA 17 8: 8 15 18 9: 9 16 19 10: 10 17 20
So, nomatch = NA
produces a right outer join (which is the default). What if I need a full join? For example:
merge(a, b, by = "dog", all = TRUE) # Or with plyr: join(a, b, by = "dog", type = "full") dog cat bullfrog 1: 1 NA 11 2: 2 NA 12 3: 3 NA 13 4: 4 NA 14 5: 5 NA 15 6: 6 NA 16 7: 7 NA 17 8: 8 15 18 9: 9 16 19 10: 10 17 20 11: 11 18 NA 12: 12 19 NA
Is that possible with data.table
?
The SQL FULL JOIN command LEFT JOIN and RIGHT JOIN each return unmatched rows from one of the tables— FULL JOIN returns unmatched rows from both tables. It is commonly used in conjunction with aggregations to understand the amount of overlap between two tables.
FULL OUTER JOIN will return 25 rows in result set. INNER JOIN will return matching rows, hence, 5 rows in result set.
You actually have it right there. Use merge.data.table
which is exactly what you are doing when you call
merge(a, b, by = "dog", all = TRUE)
since a
is a data.table
, merge(a, b, ...)
calls merge.data.table(a, b, ...)
x= data.table(a=1:5,b=11:15) y= data.table(a=c(1:4,6),c=c(101:104,106)) setkey(x,a) setkey(y,a) unique_keys <- unique(c(x[,a], y[,a])) y[x[.(unique_keys), on="a"] ] # Full Outer Join
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With