I am looking for simple method to join two tables by date range. 1 table contains exact date, another table contains two variables identifying beginning and ending of the time period. I need to join tables if date in first table is withing range from second table.
data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
name = c('id1','id2','id3','id4'))
data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'),
ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
class = c(1,2,3,4))
result <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'),
ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
name = c('id1','id2','id3','id4'),
class = c(1,2,3,4))
Any help please? I found few difficult examples but they don't even work on my data because of formats. I need something like:
select * from data1
left join
select * from data2
where data2.beginning <= data1.date <= data2.ending
Thanks
There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN.
As known, there are five types of join operations: Inner, Left, Right, Full and Cross joins.
The answer to this question is yes, you can join two unrelated tables in SQL, and in fact, there are multiple ways to do this, particularly in the Microsoft SQL Server database. The most common way to join two unrelated tables is by using CROSS join, which produces a cartesian product of two tables.
Join is a binary operation. More than two tables can be combined using multiple join operations. Understanding the join function is fundamental to understanding relational databases, which are made up of many tables.
I know the following looks horrible in base, but here's what I came up with. It's better to use the 'sqldf' package (see below).
library(data.table)
data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
name = c('id1','id2','id3','id4'))
data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'),
ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
class = c(1,2,3,4))
result <- cbind(data1,"beginning"=sapply(1:nrow(data2),function(x) data2$beginning[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
"ending"=sapply(1:nrow(data2),function(x) data2$ending[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
"class"=sapply(1:nrow(data2),function(x) data2$class[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]))
Using the package sqldf:
library(sqldf)
result = sqldf("select * from data1
left join data2
on data1.date between data2.beginning and data2.ending")
Using data.table this is simply
data1[data2, on = .(date >= beginning, date <= ending)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With