I have two tables that I would like to join together in a way equivalent to the following SQL, where I join on multiple conditions, not just equality.
require(sqldf)
require(data.table)
dt <- data.table(num=c(1, 2, 3, 4, 5, 6),
char=c('A', 'A', 'A', 'B', 'B', 'B'),
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))
dt_two <- data.table(
num =c(6, 1, 5, 2, 4, 3),
char=c('A', 'A', 'A', 'B', 'B', 'B'),
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))
dt_out_sql <- sqldf('
select dtone.num, dtone.char, dtone.bool, SUM(dttwo.num) as SUM,
MIN(dttwo.num) as MIN
from dt as dtone INNER join dt_two as dttwo on
(dtone.char = dttwo.char) and
(dtone.num >= dttwo.num OR dtone.bool)
GROUP BY dtone.num, dtone.char, dtone.bool')
I would like to avoid the SQL solution, for both performance and flexibility reasons. The same goes for doing a cross join, and then filtering/aggregating -- it would create an intermediate table with lots of unnecessary records for me to filter out.
Thank you very much!
Update -- my initial example was done in haste. In my actual problem, I'm not doing a self join.
Here's one way:
require(data.table)
setkey(dt, char)
setkey(dt_two, char)
dt_two[dt, {
val = num[i.bool | i.num >= num];
list(num=i.num, bool=i.bool, sum=sum(val), min=min(val))
}, by=.EACHI]
# char num bool sum min
# 1: A 1 TRUE 12 1
# 2: A 2 FALSE 1 1
# 3: A 3 TRUE 12 1
# 4: B 4 FALSE 9 2
# 5: B 5 TRUE 9 2
# 6: B 6 FALSE 9 2
To read about by=.EACHI
, have a look at this post (until vignettes on joins are finished).
HTH
It's a bit ugly but works:
library(data.table)
library(sqldf)
dt <- data.table(num=c(1, 2, 3, 4, 5, 6),
char=c('A', 'A', 'A', 'B', 'B', 'B'),
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))
dt_two <- data.table(
num =c(6, 1, 5, 2, 4, 3),
char=c('A', 'A', 'A', 'B', 'B', 'B'),
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))
dt_out_sql <- sqldf('
select dtone.num,
dtone.char,
dtone.bool,
SUM(dttwo.num) as SUM,
MIN(dttwo.num) as MIN
from dt as dtone
INNER join dt_two as dttwo on
(dtone.char = dttwo.char) and
(dtone.num >= dttwo.num OR dtone.bool)
GROUP BY dtone.num, dtone.char, dtone.bool
')
setDT(dt_out_sql)
setkey(dt, char)
setkey(dt_two, char)
dt_out_r <- dt[dt_two,
list(dtone.num = num,
dttwo.num = i.num,
char,
bool) ,
nomatch = 0, allow.cartesian = T
][
dtone.num >= dttwo.num | bool,
list(SUM = sum(dttwo.num),
MIN = min(dttwo.num)),
by = list(num = dtone.num,
char,
bool)
]
setkey(dt_out_r, num, char, bool)
all.equal(dt_out_sql, dt_out_r, check.attributes = FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With