Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

B-Tree for on-disk storage

Why is a B-Tree the preferred structure for on-disk storage.
What quality makes it preferrable over a binary tree for secondary storage.
Is that specific 'quality' a feature of the alogrithm itself;or the way in which it is implemented?
Any reference or pointer would be much appreciated.

like image 602
IUnknown Avatar asked Sep 23 '13 09:09

IUnknown


People also ask

What are B+ trees used for?

B+ Tree are used to store the large amount of data which can not be stored in the main memory. Due to the fact that, size of main memory is always limited, the internal nodes (keys to access records) of the B+ tree are stored in the main memory whereas, leaf nodes are stored in the secondary memory.

What is a disk seek B-tree?

B-Tree structure is designed specifically to avoid disk seeks as much as possible. Therefore B-Tree packs much more keys/pointers into a single node than a binary tree. This property makes the tree very flat. Usually most B-Trees are only 3 or 4 levels deep and the root node can be easily cached.

What are B-Trees commonly used for?

A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. Unlike self-balancing binary search trees, it is optimized for systems that read and write large blocks of data. It is most commonly used in database and file systems.


1 Answers

Disk seeks are expensive. B-Tree structure is designed specifically to avoid disk seeks as much as possible. Therefore B-Tree packs much more keys/pointers into a single node than a binary tree. This property makes the tree very flat. Usually most B-Trees are only 3 or 4 levels deep and the root node can be easily cached. This requires only 2-3 seeks to find anything in the tree. Leaves are also "packed" this way, so iterating a tree (e.g. full scan or range scan) is very efficient, because you read hundreds/thousands data-rows per single block (seek).

In binary tree of the same capacity, you'd have several tens of levels and sequential visiting every single value would require at least one seek.

like image 117
Piotr Kołaczkowski Avatar answered Nov 12 '22 09:11

Piotr Kołaczkowski