Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rtrees - basics of the algorithm

I am trying to understand the basics of RTree algorithm and I am trying to figure out how it performs the search of e.g. all retaurants within 1 km. We would have all objects stores in rectangles in our database, we would then (prbably) build a query rectangle, based on our current position, and then find all rectangles that overlap with it. WOuld we then scan through the results to find the ones of interest i.e. only objects which are restaurants?

like image 390
Bober02 Avatar asked Aug 13 '12 15:08

Bober02


Video Answer


1 Answers

Yes, this is basically how range queries on R-trees work: if a rectangle overlaps with your query region, expand it (look at the contents, rectangles or points). Otherwise, ignore it. Overlap testing is simple for rectangle-to-rectangle, and for spherical queries you need to compute the minimum distance of the sphere center to the rectangle ("minDist").

k nearest neighbor queries are a bit more tricky; here you need priority queues. Always expand the best candidate (by "minDist"), until you have found k objects that are closer than the next rectangles "minDist".

Since you can't really index the "is a restaurant" property, you'll have to either build an r-tree containing restaurants only, or filter the results by the restaurant property. (This also is how it is done e.g. in SQLite; the spatial part is indexed with an R-tree, while the restaurant property is e.g. obtained via a join or a bitmap index)

The tricky part of an R-tree is not the query, but how to build it. There are very simple but good methods for bulk loading point data (STR), but for an online database you need somewhat tricky methods. R*-trees outperform classic R-trees significantly in my experience; the reinsertions used by R*-trees are in particular tricky to implement in a real DBMS. An interesting tradeoff is to just use insert and split from R*, but not the reinsertions. On the query side, there is no difference between R and R* anyway.

kd-trees: They are related to r-trees, but have some key differences: first of all, they are not designed for disk storage, but in-memory operation only. Secondly, they are not meant to be updated (they are not balanced trees), but if you have changes you will have to rebuild them again every now and then to keep the performance good. So in some cases they will perform very well (and they are fairly simple to implement), but once you get to large data and on-disk they are much more painful. Furthermore, they do not allow for different loading strategies.

like image 82
Has QUIT--Anony-Mousse Avatar answered Oct 12 '22 22:10

Has QUIT--Anony-Mousse