Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a linked list in a B-tree node superior to an array?

I want to implement a B-tree index for my database.

I have read many data structure and algorithm books to learn how to do it. All implementations use an array to save data and child indexes.

Now I want to know: is a linked list in B-tree node superior to an array? There are some ideas I've thought about:

  1. when splitting a node, the copy operation will be more quickly than with an array.

  2. when inserting data, if the data is inserted into the middle or at the head of the array, the speed is lower than inserting to the linked list.

like image 846
KingKai Avatar asked Sep 30 '22 17:09

KingKai


2 Answers

The linked list is not better, in fact a simple array is not better either (except its simplicity which is good argument for it and search speed if sorted).

You have to realize that the "array" implementation is more a "reference" implementation than a true full power implementation. For example, the implementation of the data/key pairs inside a B-Tree node in commercial implementations uses many strategies to solve two problems: storage efficiency and efficient search of keys in the node.

With regard with efficient search, an array of key/value with an internal balanced tree structure on the top of it can make insertion/deletion/search be done in O(log N), for large B tree nodes it makes sense.

With regard to memory efficiency, the nature of data in the key and value is very important. For example, lexicographical keys can be shorten by a common start (e.g. "good", "great" have "g" in common), the data might be compressed as well using any possible scheme relevant to the nature of the data. The compression of keys is more complex as you will want to keep this lexicographical property. Remember that the more data and keys you stuff in a node, the fastest are the disk accesses.

The time to split a node is only partially relevant, as it will be much less than the time to read or write a node on typical media by several order of magnitude. On SSD and extremely fast disks (by 10 to 20 years it is expected to have disks as fast as RAM), many researches are conducted to find a successor to B-Trees, stratified B-Trees are an example.

like image 92
armel Avatar answered Oct 03 '22 06:10

armel


If the BTree is itself stored on the disk then a linked list will make it very complicated to maintain.

Keep the B-Tree structure compact. This will allow more nodes per page, locality of data and allowing caching of more nodes, and fewer disk reads/cache misses.

Use an array.

The perceived in-memory computational benefits are inconsequential.

So, in short, no, linked list is not superior.

like image 23
B-Tree Dog Avatar answered Oct 03 '22 06:10

B-Tree Dog