Joining two data.tables in R based on multiple keys and duplicate entries

Tags:

I am trying to join two data.tables in R base don multiple setkeys and which have repeated entries. As an example

>DT1
ID  state Month Day Year
1   IL    Jan   3   2013 
1   IL    Jan   3   2014
1   IL    Jan   3   2014
1   IL    Jan   10  2014
1   IL    Jan   11  2013
1   IL    Jan   30  2013
1   IL    Jan   30  2013
1   IL    Feb   2   2013
1   IL    Feb   2   2014
1   IL    Feb   3   2013
1   IL    Feb   3   2014

>DT2
state Month   Day   Year  Tavg
  IL    Jan    1    2013    13
  IL    Jan    2    2013    19
  IL    Jan    3    2013    22
  IL    Jan    4    2013    23
  IL    Jan    5    2013    26
  IL    Jan    6    2013    24
  IL    Jan    7    2013    27
  IL    Jan    8    2013    32
  IL    Jan    9    2013    36
  ...   ...    ..   ...      ... 
  ...   ...    ..   ...      ... 
  IL    Dec 31  2013    33

I would like to add the "Tavg" values of DT2 to the corresponding dates in DT1 For example, all entries in DT1 that are on Jan 3 2013 need to have Tavg 13 in an additional column.

I tried the following setkey(DT1, state, Month, Day, Year) and same for DT2 followed by a Join operation DT1[DT2, nomatch=0, allow.cartesian=TRUE But it didn't work

794

asked Mar 16 '15 16:03

Gabriel

1 Answers

Just helped a friend with this (he couldn't find a good Stack Overflow answer) so I figured this question needed a more complete "toy" answer.

Here's a couple of simple data tables with one mismatched key:

dt1 <- data.table(a = LETTERS[1:5],b=letters[1:5],c=1:5)
dt2 <- data.table(c = LETTERS[c(1:4,6)],b=letters[1:5],a=6:10)

And here's several multiple key merge options:

merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b")) #Inner Join
merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b"),all=T) #Outer Join

setkey(dt1,a,b)
setkey(dt2,c,b)

dt2[dt1] #Left Join (if dt1 is the "left" table)
dt1[dt2] #Right Join (if dt1 is the "left" table)

147

answered Oct 17 '22 15:10

D. Woods

Related questions
                            
                                How to restart a sequence based on values in another column OR reference the previous column's value in R
                            
                                knitr called from RStudio does not preserve the order in which packages are loaded
                            
                                What do ..1 and ..2 stand for in R? [duplicate]
                            
                                How to merge two large datasets while generate new column with different repeat value in r
                            
                                Output each factor level as dummy variable in stargazer summary statistics table
                            
                                Substituting the results of a calculation
                            
                                difference between 1:10 and c(1:10)
                            
                                Send expression to website return dynamic result (picture)
                            
                                read.csv replaces column-name characters like `?` with `.`, `-` with `...`
                            
                                Calculate multiple columns from one function and add them to data.frame
                            
                                How to group similar rows in R
                            
                                Exclude specific object type from the global environment
                            
                                Pass expression as variable to curve
                            
                                Classification accuracy of binomial glmer() predictions
                            
                                How to use namespaced function with dplyr::mutate_each?
                            
                                Adjusting x limits xlim() in ggplot2 geom_density() to mimic ggvis layer_densities() behavior
                            
                                Getting an R expression from a value (similar to enquote)
                            
                                Parliamentary seats graph -> colors and labels?
                            
                                Financial Data - R data.table - group by condiction
                            
                                rJava - .jcall calling issue: method with signature not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joining two data.tables in R based on multiple keys and duplicate entries

Tags:

join

r

data.table

Gabriel

People also ask

1 Answers

D. Woods

Recent Activity

Donate For Us