Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?

I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:

Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.

The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.

The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?

like image 552
Ryan C. Thompson Avatar asked Feb 22 '11 10:02

Ryan C. Thompson


People also ask

How do you find the closest pair of points?

The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.

What is the closest pair explain the closest pair algorithm?

In this problem, a set of n points are given on the 2D plane. In this problem, we have to find the pair of points, whose distance is minimum. To solve this problem, we have to divide points into two halves, after that smallest distance between two points is calculated in a recursive way.

How do you analyze brute force with the closest pair problem?

Brute-Force Method — Finding the Closest Pair The brute-force way is, like one that counts inversions in an array, to calculate the distances of every pair of points in the universe. For n number of points, we would need to measure n(n-1)/2 distances and the cost is square to n, or Θ(n²).


2 Answers

A data structure I found very useful when I had to do nearest neighbour searches was a kd-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.

In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.

[edit] You've got me thinking; here's my second thought:

for(a in A)
    b := nearest(B, a)
    if a = nearest(A, b)
        add (a, b) to results
    end if
end for

function nearest(X, y)
    return nearest member of set X to point y
end function

By my reckoning, that's O(N log N).

like image 118
Scott Avatar answered Sep 21 '22 02:09

Scott


Sorry for picking up a rather old thread but I just wanted to add a solution I've found in my textbook for an Algorithm Design class:

There is a divide-and-conquer (think merge-sort) approach to solve this problem that should be O(n logn), I've only seen it for finding the shortest distance within one set of points but it should be easily adapted to require each pairing to consist of points from different sets.

  1. Sort all points according to X-value.
  2. Split the whole set in two equal parts.
  3. Recurse on each half and pick the minimal distance of the two (d)
  4. Find the right-most point (p) in the left half and check the distance for all points between p_x and p_x + d, if any of these distances are shorter than d that is the d to return, otherwise return d.
like image 28
Hobblin Avatar answered Sep 17 '22 02:09

Hobblin