Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining two data.tables in R based on multiple keys and duplicate entries

Tags:

join

r

data.table

I am trying to join two data.tables in R base don multiple setkeys and which have repeated entries. As an example

>DT1
ID  state Month Day Year
1   IL    Jan   3   2013 
1   IL    Jan   3   2014
1   IL    Jan   3   2014
1   IL    Jan   10  2014
1   IL    Jan   11  2013
1   IL    Jan   30  2013
1   IL    Jan   30  2013
1   IL    Feb   2   2013
1   IL    Feb   2   2014
1   IL    Feb   3   2013
1   IL    Feb   3   2014

>DT2
state Month   Day   Year  Tavg
  IL    Jan    1    2013    13
  IL    Jan    2    2013    19
  IL    Jan    3    2013    22
  IL    Jan    4    2013    23
  IL    Jan    5    2013    26
  IL    Jan    6    2013    24
  IL    Jan    7    2013    27
  IL    Jan    8    2013    32
  IL    Jan    9    2013    36
  ...   ...    ..   ...      ... 
  ...   ...    ..   ...      ... 
  IL    Dec 31  2013    33

I would like to add the "Tavg" values of DT2 to the corresponding dates in DT1 For example, all entries in DT1 that are on Jan 3 2013 need to have Tavg 13 in an additional column.

I tried the following setkey(DT1, state, Month, Day, Year) and same for DT2 followed by a Join operation DT1[DT2, nomatch=0, allow.cartesian=TRUE But it didn't work

like image 794
Gabriel Avatar asked Mar 16 '15 16:03

Gabriel


People also ask

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I merge two Dataframes using two columns?

You can pass two DataFrame to be merged to the pandas. merge() method. This collects all common columns in both DataFrames and replaces each common column in both DataFrame with a single one.

What is full join in R?

Full join: The full outer join returns all of the records in a new table, whether it matches on either the left or right tables. If the table rows match, then a join will be executed, otherwise it will return NULL in places where a matching row does not exist.


1 Answers

Just helped a friend with this (he couldn't find a good Stack Overflow answer) so I figured this question needed a more complete "toy" answer.

Here's a couple of simple data tables with one mismatched key:

dt1 <- data.table(a = LETTERS[1:5],b=letters[1:5],c=1:5)
dt2 <- data.table(c = LETTERS[c(1:4,6)],b=letters[1:5],a=6:10)

And here's several multiple key merge options:

merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b")) #Inner Join
merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b"),all=T) #Outer Join

setkey(dt1,a,b)
setkey(dt2,c,b)

dt2[dt1] #Left Join (if dt1 is the "left" table)
dt1[dt2] #Right Join (if dt1 is the "left" table)
like image 147
D. Woods Avatar answered Oct 17 '22 15:10

D. Woods