Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good books/articles about spatial indexes [closed]

I am interested in good literature about spatial indexes. Which one are in use, comparisons between them in speed, space requirements, spatial queries performance when using them etc.

like image 589
watbywbarif Avatar asked Apr 11 '11 20:04

watbywbarif


2 Answers

I used to use a kind of home-grown QuadTree for spatial indexing (well before I learned the word "quadtree"). For ordinary kinds of spatial data (I deal with street map data), they are fast to create and fast to query, but they scan too many leaf nodes during queries. Specifically, with reasonable node sizes (50-100), my quadtree tended to produce around 300 results for a point query, i.e. 3-6 leaf nodes apply (very rough ballpark; results are highly variable.)

Nowadays, my preferred data structure is the the R*tree. I wrote and tested an implementation myself that obtained very good results. My code for building an R*tree is very slow compared to my QuadTree code, but the bounding boxes on the leaf nodes end up very well organized; at least half of the query space is answered by only one leaf node (i.e. if you do a random point query, there is a good chance that only a single leaf node is returned), and something like 90% of the space is covered by two nodes or less. So with a node size of 80 elements, I'd typically get 80 or 160 results from a point query, with the average closer to 160 (since a few queries do return 3-5 nodes). This holds true even in dense urban areas of the map.

I know this because I wrote a visualizer for my R* tree and the graphical objects inside it, and I tested it on a large dataset (600,000 road segments). It performs even better on point data (and other data in which bounding boxes rarely overlap). If you implement an R* tree I urge you to visualize the results, because when I wrote mine it had multiple bugs that lowered the efficiency of the tree (without affecting correctness), and I was able to tweak some of the decision-making to get better results. Be sure to test on a large dataset, as it will reveal problems that a small dataset does not. It may help to decrease the fan-out (node size) of the tree for testing, to see how well the tree works when it is several levels deep.

I'd be happy to give you the source code except that I would need my employer's permission. You know how it is. In my implementation I support forced reinsertion, but my PickSplit and insertion penalty have been tweaked.

The original paper, The R* tree: An Efficient and Robust Access Method for Points and Rectangles, is missing dots for some reason (no periods and no dots on the "i"s). Also, their terminology is a bit weird, e.g. when they say "margin", what they mean is "perimeter".

The R* tree is a good choice if you need a data structure that can be modified. If you don't need to modify the tree after you first create it, consider bulk loading algorithms. If you only need to modify the tree a small amount after bulk loading, ordinary R-tree algorithms will be good enough. Note that R*-tree and R-tree data is structurally identical; only the algorithms for insertion (and maybe deletion? I forget) are different. R-tree is the original data structure from 1984; here's a link to the R-tree paper.

The kd-tree looks efficient and not too difficult to implement, but it can only be used for point data.

By the way, the reason I focus on leaf nodes so much is that

  1. I need to deal with disk-based spatial indexes. You can generally cache all the inner nodes in memory because they are a tiny fraction of the index size; therefore the time it takes to scan them is tiny compared to the time required for a leaf node that is not cached.
  2. I save a lot of space by not storing bounding boxes for the elements in the spatial index, which means I have to actually test the original geometry of each element to answer a query. Thus it's even more important to minimize the number of leaf nodes touched.
like image 107
Qwertie Avatar answered Oct 11 '22 14:10

Qwertie


I developed a algorithm for quadrant based fast search and publushed it on ddj.com a couple of years ago. Maybe it's interesting for you:

Accelerated Search For the Nearest Line http://drdobbs.com/windows/198900559

like image 34
RED SOFT ADAIR Avatar answered Oct 11 '22 14:10

RED SOFT ADAIR