Is there any algorithm for bulk loading in B-Tree?

1 Answers

Actually The answer is yes.

The main difference between B+-trees and plain B-trees is that the values are actually stored in the leaves for the former, while in the later you will find values in every nodes. Hence B+-trees let you store data in an almost continuous manner, each leaf containing a contiguous slice of the whole sorted data. This cannot be true for B-trees: an inner node will contain several elements, but they won't be contiguous wrt. the whole sorted dataset.

This property is essential for bulk loading: that process works on an already sorted dataset by cutting it into the arrays which will form the leaves of the B+-tree. Thus for a B-tree it looks like it cannot work.

If we can sort the data in a way which group together inner nodes elements, then the problem is solved. In order to do that, one must know in advance how the elements will be grouped. This turns out to be possible.

Let's call o (order) the minimal number of children in a node (that's consistent with the original definition of a B-tree order). We consider the root node to be at the highest stage of the tree, and the leaves to be at the lowest (stage 0). For a well balanced tree, all the leaves will indeed be at the same stage.

The stage k of the tree groups elements which are spaced by at least o elements in the stage k-1. After an initial sorting, we must extract elements from the sorted array, which constitutes stage 0, and group them in a different array to build stage 1, then do that again with that array into a new array for stage 2, and repeat the process until there's less than o elements in the newest array, which will be the root stage. From then on, it is possible to build the tree directly from the stage set:

split each stage in arrays of o elements,
build index arrays to link nodes to subnodes
build each node as the pair of corresponding index array * value array

The resulting tree will not necessarily be well balanced. It depends on the number of entries in the dataset, and o. It should be possible to tune the interval used in building the stages to have a better distributed tree though.

All in all the work needed to bulk load a B-tree is more tedious than for B+-tree, but it is possible.

answered Sep 29 '22 20:09

didierc

Related questions
                            
                                Can I access an object in C++ other than using an expression?
                            
                                Fibonacci Rabbits that don't die
                            
                                Finding a representative mean INTERIOR point for a non-convex polygon
                            
                                What is the keyword "incr" in the KDevelop IDE?
                            
                                How to do this with std::bind?
                            
                                Intercept file\folder I\O without API hooking or filter driver
                            
                                C++ code migration: handling uninitialized pointers
                            
                                Class template with both pointer type and regular type
                            
                                Being extremely pedantic with the way your code is compiled
                            
                                Ray-mesh intersection or AABB tree implementation in C++ with little overhead?
                            
                                How to write a variadic method which replaces chained method calls?
                            
                                Solving design involving multiple inheritance and composite classes in c++
                            
                                Visual Studio 2012 "Smart" Indentation Customization
                            
                                Is a file guaranteed to be openable for reading immidiately after ofstream::close() has returned?
                            
                                How do I grab frames from a video stream on Windows 8 modern apps?
                            
                                Recommended improved match-finding algorithm for Bejeweled game?
                            
                                Round y=x*x to nearest
                            
                                How to correctly reference a dll in Visual Studio 2010?
                            
                                Algorithm to match one input file with given numbers of file
                            
                                What is the curly bracket parameter in the constructor of C++11 [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there any algorithm for bulk loading in B-Tree?

Tags:

c++

algorithm

b-tree

nbbk

People also ask

1 Answers

didierc

Recent Activity

Donate For Us