Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join a data.table with multiple columns and multiple values

An example case is here:

DT = data.table(x=1:4, y=6:9, z=3:6)
setkey(DT, x, y)

Join columns have multiple values:

xc = c(1, 2, 4)
yc = c(6, 9)
DT[J(xc, yc), nomatch=0]
   x y z
1: 1 6 3

This use of J() returns only single row. Actually, I want to join as %in% operator.

DT[x %in% xc & y %in% yc]
   x y z
1: 1 6 3
2: 4 9 6

But using %in% operator makes the search a vector scan which is very slow compared to binary search. In order to have binary search, I build every possible combination of join values:

xc2 = rep(xc, length(yc))
yc2 = unlist(lapply(yc, rep, length(xc)))
DT[J(xc2, yc2), nomatch=0]
   x y z
1: 1 6 3
2: 4 9 6

But building xc2, yc2 in this way makes code difficult to read. Is there a better way to have the speed of binary search and the simplicity of %in% operator in this case?

like image 245
Mert Nuhoglu Avatar asked Sep 01 '14 15:09

Mert Nuhoglu


1 Answers

Answering to remove this question from DT tag open questions.
Code from Arun's comment DT[CJ(xc,yc), nomatch=0L] will do the job.

like image 169
jangorecki Avatar answered Nov 15 '22 07:11

jangorecki