So lets say I have 10,000 points in A and 10,000 points in B and want to find out the closest point in A for every B point.
Currently, I simply loop through every point in B and A to find which one is closest in distance. ie.
B = [(.5, 1, 1), (1, .1, 1), (1, 1, .2)]
A = [(1, 1, .3), (1, 0, 1), (.4, 1, 1)]
C = {}
for bp in B:
closestDist = -1
for ap in A:
dist = sum(((bp[0]-ap[0])**2, (bp[1]-ap[1])**2, (bp[2]-ap[2])**2))
if(closestDist > dist or closestDist == -1):
C[bp] = ap
closestDist = dist
print C
However, I am sure there is a faster way to do this... any ideas?
The brute force method of finding the nearest of N points to a given point is O(N) -- you'd have to check each point. In contrast, if the N points are stored in a KD-tree, then finding the nearest point is on average O(log(N)) .
The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.
I typically use a kd-tree in such situations.
There is a C++ implementation wrapped with SWIG and bundled with BioPython that's easy to use.
You could use some spatial lookup structure. A simple option is an octree; fancier ones include the BSP tree.
You could use numpy broadcasting. For example,
from numpy import *
import numpy as np
a=array(A)
b=array(B)
#using looping
for i in b:
print sum((a-i)**2,1).argmin()
will print 2,1,0 which are the rows in a that are closest to the 1,2,3 rows of B, respectively.
Otherwise, you can use broadcasting:
z = sum((a[:,:, np.newaxis] - b)**2,1)
z.argmin(1) # gives array([2, 1, 0])
I hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With