I have two data tables:
library(data.table)
d1 <- data.table(grp = c("a", "c", "b", "a"), val = c(2, 3, 6, 7), y1 = 1:4, y2 = 5:8)
d2 <- data.table(grp = rep(c("a", "b", "c"), 2),
from = rep(c(1, 5), each = 3), to = rep(c(4, 10), each = 3), z = 11:16)
I perform a non-equi join where the value 'val' in 'd1' should fall within the range defined by 'from' and 'to' in 'd2' for each group 'grp'.
d1[d2, on = .(grp, val >= from, val <= to), nomatch = 0]
# grp val y1 y2 val.1 z
# 1: a 1 1 5 4 11
# 2: c 1 2 6 4 13
# 3: a 5 4 8 10 14
# 4: b 5 3 7 10 15
In the output, the join variables are from i
('val' and 'val.1', with the values of respectively 'from' and 'to' in 'd2'). However, I would like to have x
's join column instead. Now, because...
Columns of
x
can now be referred to using the prefixx.
and is particularly useful during joining to refer tox
's join columns as they are otherwise masked byi
's.
...this could be achieved by specifying val = x.val
in j
:
d1[d2, .(grp, val = x.val, z), on = .(grp, val >= from, val <= to), nomatch = 0]
In order to avoid typing all non-join columns (possibly many) from x
in j
, my current work-around is to join the above with the original data, which gives the desired result:
d1[d1[d2, .(grp, val = x.val, z), on = .(grp, val >= from, val <= to), nomatch = 0]
, on = .(grp, val)]
# grp val y1 y2 z
# 1: a 2 1 5 11
# 2: c 3 2 6 13
# 3: a 7 4 8 14
# 4: b 6 3 7 15
However, this seems a bit clumsy. Thus my question: how can I select the join column from x
and all non-join columns from x
in j
in one go?
PS I have considered switching the x
and i
data sets, and the conditions in on
. Although that produces the desired join values, it still requires post-processing (deleting, renaming and reordering of columns).
PS I have considered switching the x and i data sets, and the conditions in on. Although that produces the desired join values, it still requires post-processing (deleting, renaming and reordering of columns).
The amount of post processing is limited by how many on=
cols there are:
d2[d1, on=.(grp, from <= val, to >= val), nomatch=0][,
`:=`(val = from, from = NULL, to = NULL)][]
That doesn't seem too bad.
Following @Jaap's comment, here's another way, adding columns to d1
with an update join:
nm2 = setdiff(names(d2), c("from","to","grp"))
d1[d2, on=.(grp, val >= from, val <= to), (nm2) := mget(sprintf("i.%s", nm2))]
This makes sense here because the desired output is essentially d1
plus some columns from d2
(since each row of d1
matches at most one row of d2
).
Perhaps use foverlaps
from data.table
#create duplicate range
setDT(d1)[,`:=`(val1 = val)]
#setkey
setkey(d1, grp, val, val1)
setkey(d2, grp, from, to)
#join
d_merge <- foverlaps(d1, d2, nomatch = NA)
setDT(d_merge)[,`:=`(from = NULL,
to = NULL,
val1 = NULL)]
d_merge
# grp z val y1 y2
#1: a 11 2 1 5
#2: a 14 7 4 8
#3: b 15 6 3 7
#4: c 13 3 2 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With