I am learning about postgresql internals and I am wondering or postgresql B-tree index is actually classic B-tree or B+tree? To spell it out, that means the nodes contain only keys or key-value pairs?
Bayer and McCreight never explained what, if anything, the B stands for: Boeing, balanced, broad, bushy, and Bayer have been suggested. McCreight has said that "the more you think about what the B in B-trees means, the better you understand B-trees."
B+ tree is an extension of the B tree. The difference in B+ tree and B tree is that in B tree the keys and records can be stored as internal as well as leaf nodes whereas in B+ trees, the records are stored as leaf nodes and the keys are stored only in internal nodes.
The principal advantage of B+ trees over B trees is they allow you to pack in more pointers to other nodes by removing pointers to data, thus increasing the fanout and potentially decreasing the depth of the tree. The disadvantage is that there are no early outs when you might have found a match in an internal node.
I said B-trees, first, but it's arguably closer to B+ trees.
See iwis' answer discussing it more thoroughly.
You would really have to consider index + heap (+ auxiliary storage) together. An index is mostly useless on its own.
Here is a related chapter on Wikipedia.
The name of the relevant index method is "B-tree" in Postgres. Physical storage is very similar to that of tables (the heap) or any other index type. All use the same data pages with mostly the same page-layout. More in the manual.
Development is ongoing. The design has been changed (improved) in many aspects since this question was asked. The latest notable change (as of April 2021) being deduplication in Postgres 13.
It seems to me that PostgreSQL uses B+ tree.
(the picture is a modification of this picture)
Oracle, SQL Server, SQLite, DB2, and MySQL use B+ tree. It seems that also PostgreSQL uses B+ tree because:
The documentation seems to state that only the leaves of the tree have the pointers to records in the indexed table:
Each leaf page contains tuples that point to table rows. Each internal page contains tuples that point to the next level down in the tree.
When Bruce Momjian says about internal nodes he doesn't mention that they have pointers to records in the indexed table.
The src/backend/access/nbtree/README file of the PostgreSQL source code mentioned in the documentation contains this comment:
Btree Indexing
This directory contains a correct implementation of Lehman and Yao's high-concurrency B-tree management algorithm (P. Lehman and S. Yao, Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions on Database Systems, Vol 6, No. 4, December 1981, pp 650-670).
Lehman and Yao use a tree structure named B* tree, which was defined by Wedekind in On the selection of access paths in a data base system paper as B-tree where non-leaf nodes don't have pointers to records in the indexed table (they have pointers only to their children nodes). So the B* tree structure defined by Wedekind is a B+ tree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With