I know how a B-Tree works in-memory, it's easy enough to implement. However, what is currently completely beyond me, is how to find a data layout that works effectively on disk, such that:
If anyone could provide insight into layouting B-Tree structures on disk level, I'd be very grateful. Especially the last bullet point gives me a lot of headache. I would also appreciate pointers to books, but most database literature I've seen explains only the high-level structure (i.e. "this is how you do it in memory"), but skips the nitty gritty details on the disk layout.
B-Trees are a variation on binary search trees that allow quick searching in files on disk. Instead of storing one key and having two children, B-tree nodes have n keys and n+1 children, where n can be large. This shortens the tree (in terms of height) and requires much less disk access than a binary search tree would.
A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. Unlike self-balancing binary search trees, it is optimized for systems that read and write large blocks of data. It is most commonly used in database and file systems.
Splitting a node in a B-treeThe median key moves up into y's parent--which must be nonfull prior to the splitting of y--to identify the dividing point between the two new trees; if y has no parent, then the tree grows in height by one. Splitting, then, is the means by which the tree grows.
UPDATE(archived version of oracle index internals): http://web.archive.org/web/20161221112438/http://www.toadworld.com/platforms/oracle/w/wiki/11001.oracle-b-tree-index-from-the-concept-to-internals
OLD (the original link does not exist anymore): some info about oracle index internals: http://www.toadworld.com/platforms/oracle/w/wiki/11001.oracle-b-tree-index-from-the-concept-to-internals
Notes:
Databases do not directly implement indexes based on B-tree but on a variant called B+ tree. Which according to wikipedia:
A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves.
Databases work, in general, with block-oriented storage and b+ tree is more suited then a b-tree for this.
The blocks are fixed size and are left with some free space to accommodate future changes in value or key size.
A block can be either a leaf(holds actual data) or branch(holds the pointers to the leaf nodes)
A toy model how writing to disk can be implemented (for a block size 10k for arithmetic simplification):
When the information is read from a big index: can go following:
a really large index can be split on multiple files, then the address of block will be something as (filename_id, address_relative_to_this_file)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With