How can R* Tree be implemented as a persistent (disk based) one? What is the architecture of the file for saving the R* Tree index or for saving leaf values?
Notes: In addition how insert, update and delete operations can be performed in such a persistent R* Tree?
Notes II: I have implemented an in-memory R-Tree with bulk load functionality. But I think that is totally irrelevant when we speak about disk-based ones.
Well, it's pages (= blocks). The pages should have a multiple of the page size of the underlying storage, so probably 1kb or 8kb blocks. Each block has a number and can be references this way.
Directory pages store the bounding boxes of the children and their page numbers.
Child pages store the actual data objects.
Well, in theory: whenever you modify a page in memory, you write the changes to disk. That's it.
In practise, you might want to use a cache for performance, and you might want to have transactions for keeping your tree consistent in case of an application crash.
On both of these things you can find a lot of literature in the field of RDBMS architecture.
A key benefit of the R*-tree is that it is a regular page-oriented tree as you would have them in database systems all over the place. If you have a good on disk B+-tree implementation, you can reuse most of your code for the R*-tree.
To get started, you need to get used to disk-based data indexing, as it is done in classic RDBMS. I'd suggest starting with an on disk B-tree or B+-tree. Do allow deletions, because you need to think about managing deleted pages and all this.
Once you figured out the B-Tree on disk (and maybe spend some time on optimizing it!), doing an on-disk R-tree should be fairly obvious.
I havn't looked at the code, but this might be a good starting point: http://www.die-schoens.de/prg/ or some of the others linked in Looking for a disk-based B+ tree implementation in C++ or C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With