Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for 2D nearest-neighbour queries with dynamic points

I am trying to find a fast algorithm for finding the (approximate, if need be) nearest neighbours of a given point in a two-dimensional space where points are frequently removed from the dataset and new points are added.

(Relatedly, there are two variants of this problem that interest me: one in which points can be thought of as being added and removed randomly and another in which all the points are in constant motion.)

Some thoughts:

  • kd-trees offer good performance, but are only suitable for static point sets
  • R*-trees seem to offer good performance for a variety of dimensions, but the generality of their design (arbitrary dimensions, general content geometries) suggests the possibility that a more specific algorithm might offer performance advantages
  • Algorithms with existing implementations are preferable (though this is not necessary)

What's a good choice here?

like image 663
Richard Avatar asked Sep 17 '17 21:09

Richard


1 Answers

I agree with (almost) everything that @gsamaras said, just to add a few things:

  • In my experience (using large dataset with >= 500,000 points), kNN-performance of KD-Trees is worse than pretty much any other spatial index by a factor of 10 to 100. I tested them (2 KD-trees and various other indexes) on a large OpenStreetMap dataset. In the following diagram, the KD-Trees are called KDL and KDS, the 2D dataset is called OSM-P (left diagram):enter image description here The diagram is taken from this document, see bullet points below for more information.
  • This research describes an indexing method for moving objects, in case you keep (re-)inserting the same points in slightly different positions.
  • Quadtrees are not too bad either, they can be very fast in 2D, with excellent kNN performance for datasets < 1,000,000 entries.
  • If you are looking for Java implementations, have a look at my index library. In has implementations of quadtrees, R-star-tree, ph-tree, and others, all with a common API that also supports kNN. The library was written for the TinSpin, which is a framework for testing multidimensional indexes. Some results can be found enter link description here (it doesn't really describe the test data, but 'OSM-P' results are based on OpenStreetMap data with up to 50,000,000 2D points.
  • Depending on your scenario, you may also want to consider PH-Trees. They appear to be slower for kNN-queries than R-Trees in low dimensionality (though still faster than KD-Trees), but they are faster for removal and updates than RTrees. If you have a lot of removal/insertion, this may be a better choice (see the TinSpin results, Figures 2 and 46). C++ versions are available here and here.
like image 195
TilmannZ Avatar answered Sep 24 '22 02:09

TilmannZ