Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data.table by two nearest variables

I have two data tables with x,y coordinates and some other info which I would like to merge based on nearest neighbour distance, i.e. on the minimum in squared difference of both x and y (dx_i =min ([(x_i-x_j)^2+(y_i-y_j)^2]^0.5). Say I have the following two sets:

DT1=data.table(x=1:5,y=3:7)    
DT2=data.table(x=c(2,4,2,3,6),y=c(2.5,3.1,2,3,5),Q=c('a','b','c','d','e'))

Then the desired result of the merge would be:

   x y Q
1: 1 3 a
2: 2 4 d
3: 3 5 d
4: 4 6 e
5: 5 7 e

I could of course write a loop over DT1 to calculate the nearest neighbour for each row in DT1 and then merge based on this calculation, but that seems to defeat the purpose of data tables. Moreover, that will be very slow for data tables of several million rows.

I know that for a single column I could do a nearest neighbour merge like this

DT2[DT1,roll="nearest"]

But that (logically) doesn't work when I define 2 keys (x and y) for the tables to be merged. Does a similar syntax for a 2-parameter nearest neighbour merge exist? If not, is there a smarter way to do this then just looping, like I mentioned?

like image 437
Michiel Avatar asked Feb 10 '15 15:02

Michiel


1 Answers

One possible solution:

func = function(u,v)
{
    vec = with(DT2, (u-x)^2 + (v-y)^2)
    DT2[which.min(vec),]$Q
}

transform(DT1, Q=apply(DT1, 1, function(u) func(u[1], u[2])))

#   x y Q
#1: 1 3 a
#2: 2 4 d
#3: 3 5 d
#4: 4 6 e
#5: 5 7 e
like image 146
Colonel Beauvel Avatar answered Oct 04 '22 10:10

Colonel Beauvel