Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How btree is stored on disc?

I know how to implement btree in memory, but not clear about how to store btree in disc. I think there are two major difference:

  1. Conversion between memory pointer and disc address, see this post.
  2. How to split page when insert new k/v item? It is very easy to implement in memory.

Thanks

like image 636
Chang Avatar asked Jan 14 '11 07:01

Chang


People also ask

How are data structures stored on disk?

Disk platters are formatted in a system of concentric circles, or rings, called tracks. Within each track are sectors, which subdivide the circle into a system of arcs, each formatted to hold the same amount of data—typically 512 bytes.

Where are B-trees stored?

A B-tree of order m is a search tree in which each nonleaf node has up to m children. The actual elements of the collection are stored in the leaves of the tree, and the nonleaf nodes contain only keys.

What is B-tree file structure?

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children.


2 Answers

it all depends on DBMS you use. If you wanna know how it is implemented in MS SQL Server, things to read about are:

  • Pages (I guess they are in almost all modern DBMS's) - in SQL Server they are 8Kb. Database file is composed from pages.
  • Extents - logical groups of 8 continous pages
  • (S)GAM - (Shared) Global Allocation Map. Bitmap containing information about free and occupied extents. This is one of the first pages on database file.
  • IAM - Index Allocation Map. You can find out which index/heap is stored in which extents. Having this information, you can get place in the file where index/heap is stored.

Using IAM and GAM (or SGAM) you can split page - just move part of the page (which is supposed to be overflowed) to the another page on file.

IAM and GAM are also answers to your first question.

Most of these names are taken from MS SQL Server but I'm pretty sure, that in other DBMS's it is solved quite similar.

Hope it helps.

like image 173
Marcin Krupowicz Avatar answered Oct 01 '22 12:10

Marcin Krupowicz


My recommendation is to have a look at the book Database System Implementation"

Chapter 2 "data storage" and chapter 3 "representing data elements" wil give you some hints about this problem.

Chapter 4 index structures has a section on Btrees

It's the best source of information I have found so far on this topic.

like image 20
Manuel Salvadores Avatar answered Oct 01 '22 12:10

Manuel Salvadores