Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantage of B+ trees over BSTs?

Tags:

I'm learning about B+ trees in a class about databases and I was wondering what concrete advantages B+ trees would give over Binary Search Trees?

It seems like they both have O(logN) average complexity for most operations of note but B+ trees also have an additional (negligible?) search time at each child node where BSTs obviously only take O(1) time to figure out which child node to advance to.

What real-world advantages make B+ trees more popular in databases than BSTs?

like image 566
riggspc Avatar asked Mar 18 '13 19:03

riggspc


People also ask

What are the advantages of BST over hash table?

Doing order statistics, finding closest lower and greater elements, doing range queries are easy to do with BSTs. Like sorting, these operations are not a natural operation with Hash Tables. BSTs are easy to implement compared to hashing, we can easily implement our own customized BST.

What is B-tree and its advantages?

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children.

Why are B+ trees preferred over binary trees?

Explanation: Disk access is slow and B+ Tree provide search in less number of disk hits. This is primarily because unlike binary search trees, B+ trees have very high fanout (typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.

Why use a tree over a hash table?

3.2. Binary Search Trees are generally memory-efficient since they do not reserve more memory than they need to. On the other hand, Hash tables can be a bit more demanding if we don't know the exact number of elements we want to store.


1 Answers

The major advantage of the B+ tree (and B-trees in general) over binary search trees is that they play well with caches. If you have a binary search tree whose nodes are stored in more or less random order in memory, then each time you follow a pointer, the machine will have to pull in a new block of memory into the processor cache, which is dramatically slower than accessing memory already in cache.

The B+-tree and the B-tree work by having each node store a huge number of keys or values and have a large number of children. They are typically packed together in a way that makes it possible for a single node to fit nicely into cache (or, if stored on disk, to be pulled from the disk in a single read operation). You then have to do more work to find a key within the node or determine which child to read next, but because all memory accesses done on a single node can be done without going back to disk, the access times are very small. This means that even though in principle a BST might be better in terms of number of memory accesses, the B+-tree and the B-tree can performed better in terms of the runtime of those memory accesses.

The typical use case for a B+-tree or B-tree is in a database, where there is a huge amount of information and the data are so numerous that they can't all fit into main memory. Accordingly, the data can then be stored in a B+-tree or B-tree on a hard disk somewhere. This minimizes the number of disk reads necessary to pull in the data during lookups. Some filesystems (like ext4, I believe) use B-trees as well for the same reason - they minimize the number of disk lookups necessary, which is the real bottleneck.

Hope this helps!

like image 97
templatetypedef Avatar answered Oct 18 '22 18:10

templatetypedef