Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordered Linked List vs B-Tree

If you have a b+ tree as an index, then this seems to be very similar to a ordered linked list. But the ordered linked list seem to have some advantages, such as not having to navigate a tree structure and also not having to rebuild nodes when they get full, and not having to rebuild the tree when it gets unballanced.

Can anyone answer what is the reason for using a b-tree rather than an ordered linked list?

like image 499
Adam Davies Avatar asked Dec 05 '11 11:12

Adam Davies


2 Answers

After some research and paper reading I found the answer.

In order to cope with large amounts of data such as millions of records, indexes have to be organised into clusters. A cluster is a continuous group of sectors on a disk that can be read into memory quickly. These are usually about 4096 bytes long.

If we organise our index so that each of these clusters contains a ordered list of indexes which point to data on a disk (a data cluster), we can have a ordered list index.

So, if we are looking for a specific record, how do we know which cluster it is on? We perform a binary search to find the cluster in question [O(log n)].

However, to do a binary search we need to know the range of values in each data cluster, so we need meta-data that says the min and max value of each cluster and where that cluster is. This is great. Except if each data cluster contains 100 indexes, and our meta data cluster also contains 100 indexes then we can only access 100 data clusters. Which equates to 10 000 records (100 X 100).

Well that’s not enough. Lets add another meta-data cluster and we can now access 1 000 000 records. So how do we know which one of the three meta-data clusters we need to query in order to find our target data cluster. We could search them one after the other, but that doesn’t scale, and its O(n/2) on average. So I add another meta-meta-data cluster to indicate which one of the three meta-data clusters I should query to find the target data cluster. Now I have a tree!

So that’s why databases use trees. It’s not the speed, it’s the size of the indexes and the need to have indexes referencing other indexes. What I have described above is a B+Tree – child nodes contain references to other child nodes or leaf nodes, and leaf nodes contain references to data on disk.

Phew!

like image 127
Adam Davies Avatar answered Nov 17 '22 04:11

Adam Davies


There are many difference in the characteristics, but I emphasize search time.

While search is O(log N) time in b+ tree it is O(n) time in linked list, even if it is sorted.

Source: http://en.wikipedia.org/wiki/Linked_list, http://en.wikipedia.org/wiki/B%2B_tree

like image 2
bpgergo Avatar answered Nov 17 '22 03:11

bpgergo