I read about B trees and understand their input, delete methods. I read with an introduction like this:
When we build structures on disk, we must deal with certain realities of access and transfer time:
- Random access to disk typically requires on the order of 10-20 ms access time to position the head and wait for data to come up under it.
- Once the head position is right, data can be transferred at rates in excess of 1 million bytes/sec.
- Observe, then, how total transfer times behave for different size blocks (assuming a fairly fast 10 ms access time, and 1 megabyte/sec transfer rate)
So, B Tree Data Structure is made for serving from disk ( which is what makes them great for Databases ). But when I tried to implement it, I hit this problem.
Normal B Tree diagrams shows pointers to child nodes which then descend down to leaves.
But how do I make pointers on the disk? Is it like a file name ?
Pointers in the leaf nodes points to data records or blocks, (except for the pointer to the next leaf node). You can call these data pointers, if you want. So you can say the pointers point to two different things; nodes or data, but that is just a way of organizing your thoughts.
The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values. At most, a leaf node contains n record pointer and n key values. Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Block pointer will point to the child of the B tree, Record pointer will point to the the actual data of the node and key pointer will point to the key of the actual data. All the nodes of a B-tree will have all 3 pointers.
Bydefinition of B Tree, minimum children that a node can have would be 6/2 = 3. Therefore, minimum number of keys that a node can have becomes 2 (3-1).
On-disk-pointers are offsets
from the beginning of the file.
if your key
points to address n then it means that
now, as optimizations,
file.seek(1024)
. To perform the jump, the OS has to know which point in disk the data you are seeking is located. That involves some more lookup, some disk movement, but that all is done by the OS.headers
and metadata which would grow with complexity.what is interesting is that the pointers associated with each key
point to left
and right
nodes
there is no place for the data. so, in a textbook example like
struct node {
int key; //this generally is the primary key of the table
node left;
node right;
long offsetOfDataInDataFile; // <----------- we need to add this line.
}
this way first you locate the node
in the tree
. then you locate the key
. there you get the offset
to the actual data. you goto the location in the data file and read contents.
if your table has got multiple indexes, then or each of the index in the table, one such tree would need to be maintained. the key
of that tree would be the contents of the column which is indexed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With