I am interested in good literature about spatial indexes. Which one are in use, comparisons between them in speed, space requirements, spatial queries performance when using them etc.
I used to use a kind of home-grown QuadTree for spatial indexing (well before I learned the word "quadtree"). For ordinary kinds of spatial data (I deal with street map data), they are fast to create and fast to query, but they scan too many leaf nodes during queries. Specifically, with reasonable node sizes (50-100), my quadtree tended to produce around 300 results for a point query, i.e. 3-6 leaf nodes apply (very rough ballpark; results are highly variable.)
Nowadays, my preferred data structure is the the R*tree. I wrote and tested an implementation myself that obtained very good results. My code for building an R*tree is very slow compared to my QuadTree code, but the bounding boxes on the leaf nodes end up very well organized; at least half of the query space is answered by only one leaf node (i.e. if you do a random point query, there is a good chance that only a single leaf node is returned), and something like 90% of the space is covered by two nodes or less. So with a node size of 80 elements, I'd typically get 80 or 160 results from a point query, with the average closer to 160 (since a few queries do return 3-5 nodes). This holds true even in dense urban areas of the map.
I know this because I wrote a visualizer for my R* tree and the graphical objects inside it, and I tested it on a large dataset (600,000 road segments). It performs even better on point data (and other data in which bounding boxes rarely overlap). If you implement an R* tree I urge you to visualize the results, because when I wrote mine it had multiple bugs that lowered the efficiency of the tree (without affecting correctness), and I was able to tweak some of the decision-making to get better results. Be sure to test on a large dataset, as it will reveal problems that a small dataset does not. It may help to decrease the fan-out (node size) of the tree for testing, to see how well the tree works when it is several levels deep.
I'd be happy to give you the source code except that I would need my employer's permission. You know how it is. In my implementation I support forced reinsertion, but my PickSplit and insertion penalty have been tweaked.
The original paper, The R* tree: An Efficient and Robust Access Method for Points and Rectangles, is missing dots for some reason (no periods and no dots on the "i"s). Also, their terminology is a bit weird, e.g. when they say "margin", what they mean is "perimeter".
The R* tree is a good choice if you need a data structure that can be modified. If you don't need to modify the tree after you first create it, consider bulk loading algorithms. If you only need to modify the tree a small amount after bulk loading, ordinary R-tree algorithms will be good enough. Note that R*-tree and R-tree data is structurally identical; only the algorithms for insertion (and maybe deletion? I forget) are different. R-tree is the original data structure from 1984; here's a link to the R-tree paper.
The kd-tree looks efficient and not too difficult to implement, but it can only be used for point data.
By the way, the reason I focus on leaf nodes so much is that
I developed a algorithm for quadrant based fast search and publushed it on ddj.com a couple of years ago. Maybe it's interesting for you:
Accelerated Search For the Nearest Line http://drdobbs.com/windows/198900559
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With