Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Berkeleydb - B-Tree versus Hash Table

I am trying to understand what should drive the choice of the access method while using a BerkeleyDB : B-Tree versus HashTable. A Hashtable provides O(1) lookup but inserts are expensive (using Linear/Extensible hashing we get amortized O(1) for insert). But B-Trees provide log N (base B) lookup and insert times. A B-Tree can also support range queries and allow access in sorted order.

  1. Apart from these considerations what else should be factored in?
  2. If I don't need to support range queries can I just use a Hashtable access method?
like image 394
rakeshr Avatar asked Nov 09 '10 00:11

rakeshr


1 Answers

When your data sets get very large, B-trees are still better because the majority of the internal metadata may still fit in cache. Hashes, by their nature (uniform random distribution of data) are inherently cache-unfriendly. I.e., once the total size of the data set exceeds the working memory size, hash performance drops off a cliff while B-tree performance degrades gracefully (logarithmically, actually).

like image 67
hyc Avatar answered Sep 20 '22 11:09

hyc