Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any algorithm for bulk loading in B-Tree?

I know there is bulk loading in b+tree. I just wanted to know is there any algorithm for bulk loading in B-Tree. For example given an array of data what is the best way to create a B-Tree?

like image 983
nbbk Avatar asked Apr 14 '13 06:04

nbbk


People also ask

What is B-tree algorithm?

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children.

What is the limitation of B-tree?

The major drawback of B-tree is the difficulty of traversing the keys sequentially.

Is repetition allowed in B-tree?

Allowing Duplicate Records. BTree databases can contain duplicate records. One record is considered to be a duplicate of another when both records use keys that compare as equal to one another.

What are the disadvantages of B-tree over B+ tree?

Insertion in B tree is more complicated than B+ tree. B+ trees store redundant search keys but B tree has no redundant value. In a B+ tree, leaf node data is ordered as a sequential linked list but in a B tree the leaf node cannot be stored using a linked list.


1 Answers

Actually The answer is yes.

The main difference between B+-trees and plain B-trees is that the values are actually stored in the leaves for the former, while in the later you will find values in every nodes. Hence B+-trees let you store data in an almost continuous manner, each leaf containing a contiguous slice of the whole sorted data. This cannot be true for B-trees: an inner node will contain several elements, but they won't be contiguous wrt. the whole sorted dataset.

This property is essential for bulk loading: that process works on an already sorted dataset by cutting it into the arrays which will form the leaves of the B+-tree. Thus for a B-tree it looks like it cannot work.

If we can sort the data in a way which group together inner nodes elements, then the problem is solved. In order to do that, one must know in advance how the elements will be grouped. This turns out to be possible.

Let's call o (order) the minimal number of children in a node (that's consistent with the original definition of a B-tree order). We consider the root node to be at the highest stage of the tree, and the leaves to be at the lowest (stage 0). For a well balanced tree, all the leaves will indeed be at the same stage.

The stage k of the tree groups elements which are spaced by at least o elements in the stage k-1. After an initial sorting, we must extract elements from the sorted array, which constitutes stage 0, and group them in a different array to build stage 1, then do that again with that array into a new array for stage 2, and repeat the process until there's less than o elements in the newest array, which will be the root stage. From then on, it is possible to build the tree directly from the stage set:

  • split each stage in arrays of o elements,
  • build index arrays to link nodes to subnodes
  • build each node as the pair of corresponding index array * value array

The resulting tree will not necessarily be well balanced. It depends on the number of entries in the dataset, and o. It should be possible to tune the interval used in building the stages to have a better distributed tree though.

All in all the work needed to bulk load a B-tree is more tedious than for B+-tree, but it is possible.

like image 94
didierc Avatar answered Sep 29 '22 20:09

didierc