Grouping set of points to nearest pairs

Question

I need an algorithm for the following problem:

I'm given a set of 2D points P = { (x_1, y_1), (x_2, y_2), ..., (x_n, y_n) } on a plane. I need to group them in pairs in the following manner:

Find two closest points (x_a, y_a) and (x_b, y_b) in P.
Add the pair <(x_a, y_a), (x_b, y_b)> to the set of results R.
Remove <(x_a, y_a), (x_b, y_b)> from P.
If initial set P is not empty, go to the step one.
Return set of pairs R.

That naive algorithm is O(n^3), using faster algorithm for searching nearest neighbors it can be improved to O(n^2 logn). Could it be made any better?

And what if the points are not in the euclidean space?

An example (resulting groups are circled by red loops):

enter image description here

btilly · Accepted Answer

Put all of the points into an http://en.wikipedia.org/wiki/R-tree (time O(n log(n))) then for each point calculate the distance to its nearest neighbor. Put points and initial distances into a priority queue. Initialize an empty set of removed points, and an empty set of pairs. Then do the following pseudocode:

while priority_queue is not empty:
    (distance, point) = priority_queue.get();
    if point in removed_set:
        continue
    neighbor = rtree.find_nearest_neighbor(point)
    if distance < distance_between(point, neighbor):
        # The previous neighbor was removed, find the next.
        priority_queue.add((distance_between(point, neighbor), point)
    else:
        # This is the closest pair.
        found_pairs.add(point, neighbor)
        removed_set.add(point)
        removed_set.add(neighbor)
        rtree.remove(point)
        rtree.remove(neighbor)

The slowest part of this is the nearest neighbor searches. An R-tree does not guarantee that those nearest neighbor searches will be O(log(n)). But they tend to be. Furthermore you are not guaranteed that you will do O(1) neighbor searches per point. But typically you will. So average performance should be O(n log(n)). (I might be missing a log factor.)

Yves Daoust · Answer

This problem calls for a dynamic Voronoi diagram I guess.

When the Voronoi diagram of a point set is known, the nearest neighbor pair can be found in linear time.

Then deleting these two points can be done in linear or sublinear time (I didn't find precise info on that).

So globally you can expect an O(N²) solution.

user2566092 · Answer

If your distances are arbitrary and you can't embed your points into Euclidean space (and/or the dimension of the space would be really high), then there's basically no way around at least a quadratic time algorithm because you don't know what the closest pair is until you check all the pairs. It is easy to get very close to this, basically by sorting all pairs according to distance and then maintaining a boolean look up table indicating which points in your list have already been taken, and then going through the list of sorted pairs in order and adding a pair of points to your "nearest neighbors" if neither point in the pair is in the look up table of taken points, and then adding both points in the pair to the look up table if so. Complexity O(n^2 log n), with O(n^2) extra space.

Grouping set of points to nearest pairs

Tags:

algorithm

computational-geometry

nearest-neighbor

ciechowoj

3 Answers

btilly

Yves Daoust

user2566092

Recent Activity

Donate For Us

Grouping set of points to nearest pairs

Tags:

algorithm

computational-geometry

nearest-neighbor

ciechowoj

3 Answers

btilly

Yves Daoust

user2566092

Related questions

Recent Activity

Donate For Us