Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Merge 2 Data Frame by Multiple Condition Using Dates & ID

I am trying to merge 2 data frame using multiple conditions and have used merge command but unable to get a successful output.

#Data Frame df1#
ID<- c("A1", "A2","A3", "A4")
Location <- c("012A","234B","012A","238C" )
startdate <- as.Date(c("2014-11-01","2014-01-01","2015-10-01", "2015-01-01"))
enddate <- as.Date(c("2014-12-31","2014-08-31","2015-12-31","2015-12-31"))
df1<- data.frame(ID,Location, startdate, enddate)

#Data Frame df2#
ID<-c("A1", "A1", "A4")
N<- c(2,1,2)
Loss_Date <- as.Date(c("2014-11-15", "2015-12-25", "2015-11-30"))
Amt<-c("2200","1000", "500")
df2<- data.frame(ID, N, Loss_Date,Amt)

I want to merge these 2 dataframe by using Location as common column and Loss_Date in df2 lies between (inclusive) Start_Date and End_Date in df2. You can see that second entry in df2 doesnt get mapped as the date is not in range of df1

#Required Output
ID<- c("A1", "A2","A3", "A4")
Location <- c("012A","234B","012A","238C" )
startdate <- as.Date(c("2014-11-01","2014-01-01","2015-10-01", "2015-01-01"))
enddate <- as.Date(c("2014-12-31","2014-08-31","2015-12-31","2015-12-31"))
N<-c(2,0,0,2)
Loss_Date <- c("2014-11-15", "NA", "NA", "2015-11-30")
Amt<-c("2200","0","0", "500")
Output<- data.frame(ID,Location, startdate, enddate,N, Loss_Date,Amt)

I created a common ID using the Year and ID but get the wrong mapping. Tried various ways to use merge and match but the command doesn work. I need this to run on over 170K observations. Both data frame are of unequal length. Any help would be really appreciated.

like image 467
Adi Avatar asked Dec 19 '22 00:12

Adi


1 Answers

In the current development version of data.table (v1.9.7), non-equi joins are implemented. With that we can do:

require(data.table) # v1.9.7+
setDT(df2)[df1, .(ID, Location, startdate, enddate, N, x.Loss_Date, Amt), 
                      on=.(ID, Loss_Date>=startdate, Loss_Date<=enddate)]
#    ID Location  startdate    enddate  N x.Loss_Date  Amt
# 1: A1     012A 2014-11-01 2014-12-31  2  2014-11-15 2200
# 2: A2     234B 2014-01-01 2014-08-31 NA        <NA>   NA
# 3: A3     012A 2015-10-01 2015-12-31 NA        <NA>   NA
# 4: A4     238C 2015-01-01 2015-12-31  2  2015-11-30  500
like image 62
Arun Avatar answered Feb 20 '23 12:02

Arun