Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One-to-Many Left Join / Merge in Data.Table in R

Tags:

r

data.table

If you know of an answer to this question elsewhere, please let me know, but I searched and read a large number of data.table-related join topics as well as the package's intro and FAQ and still haven't found an answer. Essentially I am looking for something like a flexible left join and I would have thought setting the keys directly that A[B] would do the trick, but it does not.

Essentially I have two data sets, A and B (example below). I would like to add the data in A to B based on matching their ids and years, resulting in something like C. There are a few things that distinguish this from the examples I've seen. First, its one to many based on multiple keys (here id and year). Second, I don't want all of the possible combinations, meaning if information in A is not needed in B then that is not added. Third, I'd like to preserve B where data in A is not available (so not just the intersection).

Again, I thought something like setting the keys in A and B to be id and year, then C = A[B] would give me the answer, but I get an error saying that many (many, many) more rows are in the result, triggering the Cartesian warning. However, the result should have the exact number of rows as B, so its unclear to me what I am doing wrong.

Any advice or insights are much appreciated!

A
#    id   year   var1    var2
#    1    2010    10     20
#    1    2011    70     20
#    1    2012    90     30
#    1    2013    40     50 
#    2    2010    30     30
#    2    2011    50     60
#    2    2012    80     10
#    2    2013    90     80 
#    3    2010    50     50
#    3    2011    20     40
#    3    2012    90     30
#    3    2013    60     70 


B
#    id2  year    id    var3
#    1    2010    1     20
#    1    2011    1     20
#    1    2012    3     30
#    1    2013    4     50 
#    2    2010    1     30
#    2    2011    1     60
#    2    2012    1     10
#    2    2013    2     80 
#    3    2010    3     50
#    3    2011    2     40
#    3    2012    3     30
#    3    2013    2     70 

C
#    id2  year    id    var3   var1   var2
#    1    2010    1     20      10     20
#    1    2011    1     20      70     20
#    1    2012    3     30      90     30
#    1    2013    4     50      NA     NA 
#    2    2010    1     30      10     20
#    2    2011    1     60      70     20
#    2    2012    1     10      90     30
#    2    2013    2     80      90     80 
#    3    2010    3     50      50     50
#    3    2011    2     40      50     60
#    3    2012    3     30      90     30
#    3    2013    2     70      90     80  
like image 828
DaedalusBloom Avatar asked Dec 10 '25 10:12

DaedalusBloom


1 Answers

You did not indicate what you would like to see as the final result. You have three main options:

 A[B]
 B[A]
 merge(A, B, all=TRUE)

Note that the allow.cartesian error is something you will see if there is any duplicated keys in the joining table. In which case you can simply set the flag to TRUE or instead remove the duplicates

 B[unique(A)]
 # or
 B[A, allow=TRUE]

setkey(A, id, year)
setkey(B, id, year)


A[B]
#     id year var1 var2 id2 var3
#  1:  1 2010   10   20   1   20
#  2:  1 2010   10   20   2   30
#  3:  1 2011   70   20   1   20
#  4:  1 2011   70   20   2   60
#  5:  1 2012   90   30   2   10
#  6:  2 2011   50   60   3   40
#  7:  2 2013   90   80   2   80
#  8:  2 2013   90   80   3   70
#  9:  3 2010   50   50   3   50
# 10:  3 2012   90   30   1   30
# 11:  3 2012   90   30   3   30
# 12:  4 2013   NA   NA   1   50

B[A, allow=TRUE]
#     id year id2 var3 var1 var2
#  1:  1 2010   1   20   10   20
#  2:  1 2010   2   30   10   20
#  3:  1 2011   1   20   70   20
#  4:  1 2011   2   60   70   20
#  5:  1 2012   2   10   90   30
#  6:  1 2013  NA   NA   40   50
#  7:  2 2010  NA   NA   30   30
#  8:  2 2011   3   40   50   60
#  9:  2 2012  NA   NA   80   10
# 10:  2 2013   2   80   90   80
# 11:  2 2013   3   70   90   80
# 12:  3 2010   3   50   50   50
# 13:  3 2011  NA   NA   20   40
# 14:  3 2012   1   30   90   30
# 15:  3 2012   3   30   90   30
# 16:  3 2013  NA   NA   60   70

merge(A, B, all=TRUE, allow=TRUE)
#     id year var1 var2 id2 var3
#  1:  1 2010   10   20   1   20
#  2:  1 2010   10   20   2   30
#  3:  1 2011   70   20   1   20
#  4:  1 2011   70   20   2   60
#  5:  1 2012   90   30   2   10
#  6:  1 2013   40   50  NA   NA
#  7:  2 2010   30   30  NA   NA
#  8:  2 2011   50   60   3   40
#  9:  2 2012   80   10  NA   NA
# 10:  2 2013   90   80   2   80
# 11:  2 2013   90   80   3   70
# 12:  3 2010   50   50   3   50
# 13:  3 2011   20   40  NA   NA
# 14:  3 2012   90   30   1   30
# 15:  3 2012   90   30   3   30
# 16:  3 2013   60   70  NA   NA
# 17:  4 2013   NA   NA   1   50
like image 138
Ricardo Saporta Avatar answered Dec 13 '25 21:12

Ricardo Saporta



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!