Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pointers in B Tree

I read about B trees and understand their input, delete methods. I read with an introduction like this:

When we build structures on disk, we must deal with certain realities of access and transfer time:

  1. Random access to disk typically requires on the order of 10-20 ms access time to position the head and wait for data to come up under it.
  2. Once the head position is right, data can be transferred at rates in excess of 1 million bytes/sec.
  3. Observe, then, how total transfer times behave for different size blocks (assuming a fairly fast 10 ms access time, and 1 megabyte/sec transfer rate)

So, B Tree Data Structure is made for serving from disk ( which is what makes them great for Databases ). But when I tried to implement it, I hit this problem.

Normal B Tree diagrams shows pointers to child nodes which then descend down to leaves.

But how do I make pointers on the disk? Is it like a file name ?

like image 778
Ashish Negi Avatar asked Oct 02 '13 08:10

Ashish Negi


People also ask

What is B-tree pointer?

Pointers in the leaf nodes points to data records or blocks, (except for the pointer to the next leaf node). You can call these data pointers, if you want. So you can say the pointers point to two different things; nodes or data, but that is just a way of organizing your thoughts.

How many pointers are in a B+ tree?

The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values. At most, a leaf node contains n record pointer and n key values. Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

What is record pointer in B-tree?

Block pointer will point to the child of the B tree, Record pointer will point to the the actual data of the node and key pointer will point to the key of the actual data. All the nodes of a B-tree will have all 3 pointers.

What are the minimum numbers of keys and pointers in B-tree?

Bydefinition of B Tree, minimum children that a node can have would be 6/2 = 3. Therefore, minimum number of keys that a node can have becomes 2 (3-1).


1 Answers

On-disk-pointers are offsets from the beginning of the file.

if your key points to address n then it means that

  1. open the data-file
  2. read n bytes but discard them (or simply jump over them. that is called seeking. see below for how)
  3. start reading the data which is of your interest.

now, as optimizations,

  1. the data file could be already open, say when your program starts; could be cached in memory of course partially.
  2. rather than reading and discarding the bytes you can specifically instruct the framework to goto a specific location in the file. most languages have that feature. all OS do. This is called seeking. You call a method like file.seek(1024). To perform the jump, the OS has to know which point in disk the data you are seeking is located. That involves some more lookup, some disk movement, but that all is done by the OS.
  3. you can start reading data, but to know when to stop, either have fixed-width records or you can put the record-length in the first 4-bytes of your record. this makes headers and metadata which would grow with complexity.

what is interesting is that the pointers associated with each key point to left and right nodes there is no place for the data. so, in a textbook example like

struct node {
    int key;                      //this generally is the primary key of the table
    node left;
    node right;

    long offsetOfDataInDataFile;  // <----------- we need to add this line.
}

this way first you locate the node in the tree. then you locate the key. there you get the offset to the actual data. you goto the location in the data file and read contents.

if your table has got multiple indexes, then or each of the index in the table, one such tree would need to be maintained. the key of that tree would be the contents of the column which is indexed.

like image 189
inquisitive Avatar answered Oct 17 '22 16:10

inquisitive