If you know of an answer to this question elsewhere, please let me know, but I searched and read a large number of data.table-related join topics as well as the package's intro and FAQ and still haven't found an answer. Essentially I am looking for something like a flexible left join and I would have thought setting the keys directly that A[B] would do the trick, but it does not.
Essentially I have two data sets, A and B (example below). I would like to add the data in A to B based on matching their ids and years, resulting in something like C. There are a few things that distinguish this from the examples I've seen. First, its one to many based on multiple keys (here id and year). Second, I don't want all of the possible combinations, meaning if information in A is not needed in B then that is not added. Third, I'd like to preserve B where data in A is not available (so not just the intersection).
Again, I thought something like setting the keys in A and B to be id and year, then C = A[B] would give me the answer, but I get an error saying that many (many, many) more rows are in the result, triggering the Cartesian warning. However, the result should have the exact number of rows as B, so its unclear to me what I am doing wrong.
Any advice or insights are much appreciated!
A
# id year var1 var2
# 1 2010 10 20
# 1 2011 70 20
# 1 2012 90 30
# 1 2013 40 50
# 2 2010 30 30
# 2 2011 50 60
# 2 2012 80 10
# 2 2013 90 80
# 3 2010 50 50
# 3 2011 20 40
# 3 2012 90 30
# 3 2013 60 70
B
# id2 year id var3
# 1 2010 1 20
# 1 2011 1 20
# 1 2012 3 30
# 1 2013 4 50
# 2 2010 1 30
# 2 2011 1 60
# 2 2012 1 10
# 2 2013 2 80
# 3 2010 3 50
# 3 2011 2 40
# 3 2012 3 30
# 3 2013 2 70
C
# id2 year id var3 var1 var2
# 1 2010 1 20 10 20
# 1 2011 1 20 70 20
# 1 2012 3 30 90 30
# 1 2013 4 50 NA NA
# 2 2010 1 30 10 20
# 2 2011 1 60 70 20
# 2 2012 1 10 90 30
# 2 2013 2 80 90 80
# 3 2010 3 50 50 50
# 3 2011 2 40 50 60
# 3 2012 3 30 90 30
# 3 2013 2 70 90 80
You did not indicate what you would like to see as the final result. You have three main options:
A[B]
B[A]
merge(A, B, all=TRUE)
Note that the allow.cartesian error is something you will see if there is any duplicated keys in the joining table. In which case you can simply set the flag to TRUE or instead remove the duplicates
B[unique(A)]
# or
B[A, allow=TRUE]
setkey(A, id, year)
setkey(B, id, year)
A[B]
# id year var1 var2 id2 var3
# 1: 1 2010 10 20 1 20
# 2: 1 2010 10 20 2 30
# 3: 1 2011 70 20 1 20
# 4: 1 2011 70 20 2 60
# 5: 1 2012 90 30 2 10
# 6: 2 2011 50 60 3 40
# 7: 2 2013 90 80 2 80
# 8: 2 2013 90 80 3 70
# 9: 3 2010 50 50 3 50
# 10: 3 2012 90 30 1 30
# 11: 3 2012 90 30 3 30
# 12: 4 2013 NA NA 1 50
B[A, allow=TRUE]
# id year id2 var3 var1 var2
# 1: 1 2010 1 20 10 20
# 2: 1 2010 2 30 10 20
# 3: 1 2011 1 20 70 20
# 4: 1 2011 2 60 70 20
# 5: 1 2012 2 10 90 30
# 6: 1 2013 NA NA 40 50
# 7: 2 2010 NA NA 30 30
# 8: 2 2011 3 40 50 60
# 9: 2 2012 NA NA 80 10
# 10: 2 2013 2 80 90 80
# 11: 2 2013 3 70 90 80
# 12: 3 2010 3 50 50 50
# 13: 3 2011 NA NA 20 40
# 14: 3 2012 1 30 90 30
# 15: 3 2012 3 30 90 30
# 16: 3 2013 NA NA 60 70
merge(A, B, all=TRUE, allow=TRUE)
# id year var1 var2 id2 var3
# 1: 1 2010 10 20 1 20
# 2: 1 2010 10 20 2 30
# 3: 1 2011 70 20 1 20
# 4: 1 2011 70 20 2 60
# 5: 1 2012 90 30 2 10
# 6: 1 2013 40 50 NA NA
# 7: 2 2010 30 30 NA NA
# 8: 2 2011 50 60 3 40
# 9: 2 2012 80 10 NA NA
# 10: 2 2013 90 80 2 80
# 11: 2 2013 90 80 3 70
# 12: 3 2010 50 50 3 50
# 13: 3 2011 20 40 NA NA
# 14: 3 2012 90 30 1 30
# 15: 3 2012 90 30 3 30
# 16: 3 2013 60 70 NA NA
# 17: 4 2013 NA NA 1 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With