I am looking for simple method to join two tables by date range. 1 table contains exact date, another table contains two variables identifying beginning and ending of the time period. I need to join tables if date in first table is withing range from second table. <pre class="prettyprint"><code>data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'), name = c('id1','id2','id3','id4')) data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'), class = c(1,2,3,4)) result <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'), beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'), name = c('id1','id2','id3','id4'), class = c(1,2,3,4)) </code></pre> Any help please? I found few difficult examples but they don't even work on my data because of formats. I need something like: <pre class="prettyprint"><code>select * from data1 left join select * from data2 where data2.beginning <= data1.date <= data2.ending </code></pre> Thanks

I know the following looks horrible in base, but here's what I came up with. It's better to use the 'sqldf' package (see below). <pre class="prettyprint"><code>library(data.table) data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'), name = c('id1','id2','id3','id4')) data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'), class = c(1,2,3,4)) result <- cbind(data1,"beginning"=sapply(1:nrow(data2),function(x) data2$beginning[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]), "ending"=sapply(1:nrow(data2),function(x) data2$ending[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]), "class"=sapply(1:nrow(data2),function(x) data2$class[data2$beginning[x]<data1$date & data2$ending[x]>data1$date])) </code></pre> Using the package sqldf: <pre class="prettyprint"><code>library(sqldf) result = sqldf("select * from data1 left join data2 on data1.date between data2.beginning and data2.ending") </code></pre> Using data.table this is simply <pre class="prettyprint"><code>data1[data2, on = .(date >= beginning, date <= ending)] </code></pre>

Join tables by date range [duplicate]

Tags:

date

merge

join

range

r

I am looking for simple method to join two tables by date range. 1 table contains exact date, another table contains two variables identifying beginning and ending of the time period. I need to join tables if date in first table is withing range from second table.

data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                name = c('id1','id2','id3','id4'))


data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                class = c(1,2,3,4))

result <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                 beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                 ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                 name = c('id1','id2','id3','id4'),
                 class = c(1,2,3,4))

Any help please? I found few difficult examples but they don't even work on my data because of formats. I need something like:

select * from data1
left join
select * from data2
where data2.beginning <= data1.date <= data2.ending

Thanks

521

asked May 30 '14 16:05

Residium

1 Answers

I know the following looks horrible in base, but here's what I came up with. It's better to use the 'sqldf' package (see below).

library(data.table)
data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                    name = c('id1','id2','id3','id4'))


data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                    ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                    class = c(1,2,3,4))

result <- cbind(data1,"beginning"=sapply(1:nrow(data2),function(x) data2$beginning[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
            "ending"=sapply(1:nrow(data2),function(x) data2$ending[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
            "class"=sapply(1:nrow(data2),function(x) data2$class[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]))

Using the package sqldf:

library(sqldf)
result = sqldf("select * from data1
                left join data2
                on data1.date between data2.beginning and data2.ending")

Using data.table this is simply

data1[data2, on = .(date >= beginning, date <= ending)]

answered Nov 30 '22 19:11

nfmcclure

Related questions
                            
                                Vertex frame width in R network plot
                            
                                Load high-dimensional R dataset into Pandas DataFrame
                            
                                Dynamic number of calls to a chunk with knitr
                            
                                How to adjust the point size to the scale of the plot in ggplot2?
                            
                                Chi Square Test of Independence in Python
                            
                                Shiny - use results of function call in observe in output
                            
                                decimal point setting in fread, data.table
                            
                                How to capture RCurl verbose output
                            
                                Why heatmap.2 in R failed to read the numeric data frame?
                            
                                R: Force data.table to compute all interactions
                            
                                How to convert from category to numeric in r
                            
                                Scope of methods invoked by UseMethod
                            
                                Why is names(x) better than attr(x, "names")?
                            
                                Feature Selection in caret rfe + sum with ROC
                            
                                Using geom_path from ggplot library
                            
                                neuralnet package in R - how to obtain weights prior to training convergence?
                            
                                Multi-Armed Bandit Analysis for Price Optimization
                            
                                Aligning grid lines to axis ticks in lattice graphics
                            
                                Error in data frame undefined columns selected
                            
                                Change size of image with a slider in Shiny

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With