Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping set of points to nearest pairs

I need an algorithm for the following problem:

I'm given a set of 2D points P = { (x_1, y_1), (x_2, y_2), ..., (x_n, y_n) } on a plane. I need to group them in pairs in the following manner:

  1. Find two closest points (x_a, y_a) and (x_b, y_b) in P.
  2. Add the pair <(x_a, y_a), (x_b, y_b)> to the set of results R.
  3. Remove <(x_a, y_a), (x_b, y_b)> from P.
  4. If initial set P is not empty, go to the step one.
  5. Return set of pairs R.

That naive algorithm is O(n^3), using faster algorithm for searching nearest neighbors it can be improved to O(n^2 logn). Could it be made any better?

And what if the points are not in the euclidean space?

An example (resulting groups are circled by red loops):

enter image description here

like image 615
ciechowoj Avatar asked May 27 '15 17:05

ciechowoj


3 Answers

Put all of the points into an http://en.wikipedia.org/wiki/R-tree (time O(n log(n))) then for each point calculate the distance to its nearest neighbor. Put points and initial distances into a priority queue. Initialize an empty set of removed points, and an empty set of pairs. Then do the following pseudocode:

while priority_queue is not empty:
    (distance, point) = priority_queue.get();
    if point in removed_set:
        continue
    neighbor = rtree.find_nearest_neighbor(point)
    if distance < distance_between(point, neighbor):
        # The previous neighbor was removed, find the next.
        priority_queue.add((distance_between(point, neighbor), point)
    else:
        # This is the closest pair.
        found_pairs.add(point, neighbor)
        removed_set.add(point)
        removed_set.add(neighbor)
        rtree.remove(point)
        rtree.remove(neighbor)

The slowest part of this is the nearest neighbor searches. An R-tree does not guarantee that those nearest neighbor searches will be O(log(n)). But they tend to be. Furthermore you are not guaranteed that you will do O(1) neighbor searches per point. But typically you will. So average performance should be O(n log(n)). (I might be missing a log factor.)

like image 173
btilly Avatar answered Nov 20 '22 17:11

btilly


This problem calls for a dynamic Voronoi diagram I guess.

When the Voronoi diagram of a point set is known, the nearest neighbor pair can be found in linear time.

Then deleting these two points can be done in linear or sublinear time (I didn't find precise info on that).

So globally you can expect an O(N²) solution.

like image 27
Yves Daoust Avatar answered Nov 20 '22 17:11

Yves Daoust


If your distances are arbitrary and you can't embed your points into Euclidean space (and/or the dimension of the space would be really high), then there's basically no way around at least a quadratic time algorithm because you don't know what the closest pair is until you check all the pairs. It is easy to get very close to this, basically by sorting all pairs according to distance and then maintaining a boolean look up table indicating which points in your list have already been taken, and then going through the list of sorted pairs in order and adding a pair of points to your "nearest neighbors" if neither point in the pair is in the look up table of taken points, and then adding both points in the pair to the look up table if so. Complexity O(n^2 log n), with O(n^2) extra space.

like image 1
user2566092 Avatar answered Nov 20 '22 16:11

user2566092