Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash table vs Balanced binary tree [closed]

What factors should I take into account when I need to choose between a hash table or a balanced binary tree in order to implement a set or an associative array?

like image 746
peoro Avatar asked Jan 31 '11 00:01

peoro


People also ask

Why is a hash table better than a binary tree?

The main advantage of the hash table over self-balancing binary search trees is the constant speed of access. It is partially efficient when the maximum number of entries is known so that the underlying bucket array can be allocated once and never be resized.

What is the advantage of a hash table over BST?

What is the advantage of a hash table over BST? Explanation: Hash table and BST both are examples of data structures. Hash table has an advantage that it has a better time complexity for performing insert, delete and search operations.

Is Hashtable faster than binary tree?

Hashes are typically faster, although binary searches have better worst-case characteristics.

What is the main reason to use hash table instead of trees?

Hash tables have better time complexity bounds on search, delete, and insert operations in comparison to self-balancing binary search trees.


2 Answers

This question cannot be answered, in general, I fear.

The issue is that there are many types of hash tables and balanced binary trees, and their performances vary widely.

So, the naive answer is: it depends on the functionality you need. Use a hash table if you do not need ordering and a balanced binary tree otherwise.

For a more elaborate answer, let's consider some alternatives.

Hash Table (see Wikipedia's entry for some basics)

  • Not all hash tables use a linked-list as a bucket. A popular alternative is to use a "better" bucket, for example a binary tree, or another hash table (with another hash function), ...
  • Some hash tables do not use buckets at all: see Open Addressing (they come with other issues, obviously)
  • There is something called Linear re-hashing (it's a quality of implementation detail), which avoids the "stop-the-world-and-rehash" pitfall. Basically during the migration phase you only insert in the "new" table, and also move one "old" entry into the "new" table. Of course, migration phase means double look-up etc...

Binary Tree

  • Re-balancing is costly, you may consider a Skip-List (also better for multi-threaded accesses) or a Splay Tree.
  • A good allocator can "pack" nodes together in memory (better caching behavior), even though this does not alleviate the pointer-look-up issue.
  • B-Tree and variants also offer "packing"

Let's not forget that O(1) is an asymptotic complexity. For few elements, the coefficient is usually more important (performance-wise). Which is especially true if your hash function is slow...

Finally, for sets, you may also wish to consider probabilistic data structures, like Bloom Filters.

like image 196
Matthieu M. Avatar answered Sep 25 '22 00:09

Matthieu M.


Hash tables are generally better if there isn't any need to keep the data in any sort of sequence. Binary trees are better if the data must be kept sorted.

like image 33
supercat Avatar answered Sep 26 '22 00:09

supercat