Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting a node in b+ tree

I'm trying to figure out what exactly happens when there is a node overflow. info: in my b+ tree there are 4 pointers per block and 3 data sections . problem: I understood that when there is an overflow we split into 2 nodes in my case each with 2 keys, and insert the parent node the mid value, without erasing from the son(unlike in b tree).

however I got into situation:

                                |21|30|50|

           |10|20|-|      |21|22|25|  |30|40|-| |50|60|80|  

and I want to insert the key 23 first I split |21|22|25| into: |21|22|-| and |23|25|-| now I need to insert the key 23 to the parent |21|30|50| witch causes another split. |21|23|-| and |30|50|-| but where does the pointer before 30 points to? is it possible that both this pointer & the one after 23 point to |23|25|-| ?

like image 381
farchy Avatar asked Jun 11 '11 07:06

farchy


People also ask

How do you split a node B-tree?

Splitting a node in a B-treeThe median key moves up into y's parent--which must be nonfull prior to the splitting of y--to identify the dividing point between the two new trees; if y has no parent, then the tree grows in height by one. Splitting, then, is the means by which the tree grows.

How many keys can each node have in a B-tree?

All nodes (including root) may contain at most (2*t – 1) keys.

How many nodes does B-tree have?

2-3 Tree (applet) A 2-3 tree is a type of B-tree where every node with children (internal node) has either two children and one data element (2-nodes) or three children and two data elements (3-node). Leaf nodes have no children and one or two data elements.

What is a node in a B+ tree?

1 B+ Trees. A tree is an abstract data type, consisting of a set of linked nodes. The nodes are hierarchical, and the top most node is called the root. An internal node is a node that has at least one child. A leaf node is a node with no children.


2 Answers

When inserting 23:

  • as you said, 21|22|-| and |23|25|-| are created
  • the 2 nodes require a parent
  • a parent is created in the the root node: |21|23|30|50|
  • the root has too many elements now
  • split the root in 2 nodes |21|23|- and |30|50|-
  • add a new parent for the 2 new nodes (which happens to be the new root of the tree)

Basically, that insert will increase the depth of the tree by 1

like image 67
Victor Hurdugaci Avatar answered Oct 21 '22 02:10

Victor Hurdugaci


Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)

                [21             30           50]
               /   \              \            \
       [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]  

After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.

So splits should look like this.

  old          fresh 
[21|22|-] => [23|25|-]

since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.

                [21                             30           50]
               /   \                              \            \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]  

only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).

                [21                          30]          [ 50 ]
               /   \                           \          *    \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]  

Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!

(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)

our middle key was 23. so lets add 23.

                [21            23             30]          [ 50 ]
               /   \             \              \          *    \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]  

Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.

(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)

here new middle key is 30. lets pop it from left node.

 30
   \
[30|40|-]

(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)

                [21            23]            30           [ 50 ]
               /   \             \              \          *    \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]  

Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.

                [21            23]       30            [50]
               /   \             \         *          /    \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]  

Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.

                                   [       30        ]
                                  /                   \
                [21            23]                     [50]
               /   \             \                    /    \
       [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]

Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.


There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).

like image 23
M.kazem Akhgary Avatar answered Oct 21 '22 03:10

M.kazem Akhgary