Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do non-equi join with variable column name

Tags:

r

data.table

How can I do a non-equi join in data.table 1.9.7 with a variable column name? For example,

Known column name:

library(data.table)
dt <- data.table(x=round(rnorm(10)), y=rnorm(10))
binDT <- data.table(LB=c(-Inf, -1, 0, .2, .7, 1.5, 3), RB=c(-1, 0, .2, .7, 1.5, 3, Inf))
dt[binDT, on=.(x>=LB, x<RB)]
       x       y  x.1
 1: -Inf  2.2669 -1.0
 2: -1.0 -0.5453  0.0
 3: -1.0  0.5125  0.0
 4:  0.0  1.4151  0.2
 5:  0.0 -0.1440  0.2
 6:  0.0 -1.1802  0.2
 7:  0.0  0.3338  0.2
 8:  0.0 -1.8220  0.2
 9:  0.2      NA  0.7
10:  0.7  0.3155  1.5
11:  0.7 -0.6284  1.5
12:  1.5      NA  3.0
13:  3.0      NA  Inf

Variable column name:

colName <- "x"
dt[binDT, on=.(get(colName)>=LB, get(colName)<RB)]  # Error
dt[binDT, on=eval(parse(text="list(x>=LB, x<RB)"))]  # Error
like image 684
Ben Avatar asked Sep 18 '16 06:09

Ben


People also ask

How are non Equi join implemented?

NON EQUI JOIN performs a JOIN using comparison operator other than equal(=) sign like >, <, >=, <= with conditions.

Which join is not an equi join?

Non-Equi Join is also a type of INNER Join in which we need to retrieve data from multiple tables. Non-Equi Join matches the column values from different tables based on an inequality based on the operators like <,>,<=,>=,!= , BETWEEN, etc.

What is the alternate name of equi join?

'equi-join' means joining tables using the equality operator or equivalent. I would still call an outer join an 'equi-join' if it only uses equality (others may disagree). 'inner join' is opposed to 'outer join' and determines how to join two sets when there is no matching value.


2 Answers

@Shape answer is fine, but there is easier way to achieve it. on argument can take character vector, so it can be a matter of pasting expected columns and operators.

colName="x"
on=sprintf(c("%s>=LB","%s<RB"), colName)
print(on)
#[1] "x>=LB" "x<RB"
dt[binDT, on=on]
#       x           y  x.1
# 1: -Inf          NA -1.0
# 2: -1.0  0.48127355  0.0
# 3:  0.0  0.11779604  0.2
# 4:  0.0 -0.97891522  0.2
# 5:  0.0 -0.05969859  0.2
# 6:  0.0 -0.05625401  0.2
# 7:  0.2          NA  0.7
# 8:  0.7 -0.84438216  1.5
# 9:  0.7  0.80151913  1.5
#10:  1.5 -0.11013456  3.0
#11:  1.5  0.82139242  3.0
#12:  3.0 -1.24386831  Inf
like image 147
jangorecki Avatar answered Nov 13 '22 10:11

jangorecki


Use substitute with dummy variable names and feed it a named list:

res <- substitute(dt[binDT,on=.(A>=LB,B<RB)],
                  list(A = as.name(colName), 
                       B = as.name(colName)))

# the values get replaced in the call
> res
dt[binDT, on = .(x >= LB, x < RB)]

eval(res)

       x           y  x.1
 1: -Inf          NA -1.0
 2: -1.0  0.69668714  0.0
 3: -1.0 -0.03824623  0.0
 4:  0.0  0.91269554  0.2
 5:  0.0  0.42322463  0.2
 6:  0.0 -0.22891670  0.2
 7:  0.0  0.61413004  0.2
 8:  0.2          NA  0.7
 9:  0.7 -1.47526635  1.5
10:  0.7 -1.12899562  1.5
11:  0.7  1.05462948  1.5
12:  1.5 -0.04467894  3.0
13:  3.0          NA  Inf
like image 25
Shape Avatar answered Nov 13 '22 11:11

Shape