Merge data.table by two nearest variables

Question

I have two data tables with x,y coordinates and some other info which I would like to merge based on nearest neighbour distance, i.e. on the minimum in squared difference of both x and y (dx_i =min ([(x_i-x_j)^2+(y_i-y_j)^2]^0.5). Say I have the following two sets:

DT1=data.table(x=1:5,y=3:7)    
DT2=data.table(x=c(2,4,2,3,6),y=c(2.5,3.1,2,3,5),Q=c('a','b','c','d','e'))

Then the desired result of the merge would be:

   x y Q
1: 1 3 a
2: 2 4 d
3: 3 5 d
4: 4 6 e
5: 5 7 e

I could of course write a loop over DT1 to calculate the nearest neighbour for each row in DT1 and then merge based on this calculation, but that seems to defeat the purpose of data tables. Moreover, that will be very slow for data tables of several million rows.

I know that for a single column I could do a nearest neighbour merge like this

DT2[DT1,roll="nearest"]

But that (logically) doesn't work when I define 2 keys (x and y) for the tables to be merged. Does a similar syntax for a 2-parameter nearest neighbour merge exist? If not, is there a smarter way to do this then just looping, like I mentioned?

Colonel Beauvel · Accepted Answer

One possible solution:

func = function(u,v)
{
    vec = with(DT2, (u-x)^2 + (v-y)^2)
    DT2[which.min(vec),]$Q
}

transform(DT1, Q=apply(DT1, 1, function(u) func(u[1], u[2])))

#   x y Q
#1: 1 3 a
#2: 2 4 d
#3: 3 5 d
#4: 4 6 e
#5: 5 7 e

Merge data.table by two nearest variables

Tags:

merge

r

data.table

Michiel

1 Answers

Colonel Beauvel

Recent Activity

Donate For Us

Merge data.table by two nearest variables

Tags:

merge

r

data.table

Michiel

1 Answers

Colonel Beauvel

Related questions

Recent Activity

Donate For Us