Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

B+ tree or B-tree

I am learning about postgresql internals and I am wondering or postgresql B-tree index is actually classic B-tree or B+tree? To spell it out, that means the nodes contain only keys or key-value pairs?

like image 854
Borys Avatar asked Jul 28 '14 21:07

Borys


People also ask

Why is B-tree called B-tree?

Bayer and McCreight never explained what, if anything, the B stands for: Boeing, balanced, broad, bushy, and Bayer have been suggested. McCreight has said that "the more you think about what the B in B-trees means, the better you understand B-trees."

What are the differences between B-tree and B+ tree?

B+ tree is an extension of the B tree. The difference in B+ tree and B tree is that in B tree the keys and records can be stored as internal as well as leaf nodes whereas in B+ trees, the records are stored as leaf nodes and the keys are stored only in internal nodes.

Is B+ tree better than B-tree?

The principal advantage of B+ trees over B trees is they allow you to pack in more pointers to other nodes by removing pointers to data, thus increasing the fanout and potentially decreasing the depth of the tree. The disadvantage is that there are no early outs when you might have found a match in an internal node.


2 Answers

I said B-trees, first, but it's arguably closer to B+ trees.
See iwis' answer discussing it more thoroughly.

You would really have to consider index + heap (+ auxiliary storage) together. An index is mostly useless on its own.

Here is a related chapter on Wikipedia.

The name of the relevant index method is "B-tree" in Postgres. Physical storage is very similar to that of tables (the heap) or any other index type. All use the same data pages with mostly the same page-layout. More in the manual.

Development is ongoing. The design has been changed (improved) in many aspects since this question was asked. The latest notable change (as of April 2021) being deduplication in Postgres 13.

like image 121
Erwin Brandstetter Avatar answered Oct 21 '22 21:10

Erwin Brandstetter


It seems to me that PostgreSQL uses B+ tree.

The difference between B-tree and B+ tree

  • In B-tree the pointers to records in the indexed table are not only in the leaves of the tree, but also in all internal nodes of the tree.
  • In B+ tree the pointers to records in the indexed table are only in the leaves of the tree. The advantages of B+ tree over B-tree are described here.

the picture is a modification (the picture is a modification of this picture)

Usage of B+ tree in DBMSs

Oracle, SQL Server, SQLite, DB2, and MySQL use B+ tree. It seems that also PostgreSQL uses B+ tree because:

  • The documentation seems to state that only the leaves of the tree have the pointers to records in the indexed table:

    Each leaf page contains tuples that point to table rows. Each internal page contains tuples that point to the next level down in the tree.

  • When Bruce Momjian says about internal nodes he doesn't mention that they have pointers to records in the indexed table.

  • The src/backend/access/nbtree/README file of the PostgreSQL source code mentioned in the documentation contains this comment:

    Btree Indexing

    This directory contains a correct implementation of Lehman and Yao's high-concurrency B-tree management algorithm (P. Lehman and S. Yao, Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions on Database Systems, Vol 6, No. 4, December 1981, pp 650-670).

    Lehman and Yao use a tree structure named B* tree, which was defined by Wedekind in On the selection of access paths in a data base system paper as B-tree where non-leaf nodes don't have pointers to records in the indexed table (they have pointers only to their children nodes). So the B* tree structure defined by Wedekind is a B+ tree.

like image 37
iwis Avatar answered Oct 21 '22 23:10

iwis