Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What spatial indexing algorithm should I use?

I want to implement some king of spatial indexing data structure for my MKAnnotations. Currently it's horribly slow when I try to filter them based on distance criteria ( 3-4k of locations, currently extremely slow with a simple double for ... ).

I'd like to create clusters of MKAnnotations, to decide if it is close to another. Also, these locations are in a somewhat (creation) order and a "previous"/"next" functionality would be needed to "jump" between (this is not a must). I've read about kd-tree and r-tree structures and they both seem to meet the fast distance/neighbor obtaining option for filtering/clustering, but I'm not sure which is the best for me or if there are other options too. What algorithm/data structure should I use ?

Update: I store these locations in a Core Data database, they represent a path. When the map is opened they are fetched into an array and then I just use that array for distance calculations and annotation creation. When the user moves/zooms the map, I loop through them and decide what needs to be changed on map, so it is kinda static the whole stuff. As I understood, if I'd be using a tree, I could store the locations there and when a zoom/move happens I just search through it and obtain the ones in the new region. Is this true ?

Even in the dynamic case, when I can add new locations to this array, it would be a single insertion and it's happening rarely.

like image 695
Templar Avatar asked Oct 01 '12 18:10

Templar


1 Answers

It depends a lot on what your usage patterns are (how my writes, for example, in-memory or on-disk) and how your data looks like (that is how it is distributed).

R-trees are good because they are balanced, and allow updating. The R*-tree in my experience is clearly better than the other variants because of the split strategy it has. The benefit is that it produces more square pages than the other strategies, so that for many queries you will need to scan fewer pages.

kd-trees are good if you are in-memory and static. Updating them is very bad, you will need to rebuild the index quite often.

And if your data does not change very often, bulk-loading for the R-tree works very well. You can do Sort-Tile-Recursive bulk loading, which essentially requires (partially) sorting your data on X and Y alternatingly, so it takes a low O(n log n) to build the tree; very similar to bulk-loading an kd-tree, except that you multi-split instead of binary splitting. This is very popular.

Furthermore, you can keep track of the number of objects in each page. When displaying things on a map, you may want to stop early when a page would display too small on the screen (i.e. smaller than a marker). At this point, you would not scan that page, but only take the number of objects and display that as a clustered marker until the user zooms in.

For 2D data, with a limited value domain, do not overlook the simple things. Quadtrees can work really well, too! Simplicity can make it a lot easier to optimize things. Or a classic grid approach. If your users tend to spread their annotations in an area (and not put them all into one place), you can just compute integer x,y grid coordinates, and then hash them and make a list for each grid cell.

like image 179
Has QUIT--Anony-Mousse Avatar answered Nov 02 '22 01:11

Has QUIT--Anony-Mousse