Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why implement a Hashtable with a Binary Search Tree?

When implementing a Hashtable using an array, we inherit the constant time indexing of the array. What are the reasons for implementing a Hashtable with a Binary Search Tree since it offers search with O(logn)? Why not just use a Binary Search Tree directly?

like image 826
Michael Avatar asked Apr 10 '14 18:04

Michael


People also ask

What are the advantages of a binary search tree over a hash table?

Binary Search Trees are generally memory-efficient since they do not reserve more memory than they need to. On the other hand, Hash tables can be a bit more demanding if we don't know the exact number of elements we want to store.

What is the main reason for hash table instead of trees like BST?

In BST we can do range searches efficiently but in Hash Table we cannot do range search efficienly. BST are memory efficient but Hash table is not.

Why do you need to implement a hash table?

The idea of a hash table is to provide a direct access to its items. So that is why the it calculates the "hash code" of the key and uses it to store the item, insted of the key itself. The idea is to have only one hash code per key.

What is the purpose of binary search trees?

Simply put, a binary search tree is a data structure that allows for fast insertion, removal, and lookup of items while offering an efficient way to iterate them in sorted order.


1 Answers

If the elements don't have a total order (i.e. the "greater than" and "less than" is not be defined for all pairs or it is not consistent between elements), you can't compare all pairs, thus you can't use a BST directly, but nothing's stopping you from indexing the BST by the hash value - since this is an integral value, it obviously has a total order (although you'd still need to resolve collision, that is have a way to handle elements with the same hash value).

However, one of the biggest advantages of a BST over a hash table is the fact that the elements are in order - if we order it by hash value, the elements will have an arbitrary order instead, and this advantage would no longer be applicable.

As for why one might consider implementing a hash table using a BST instead of an array, it would:

  • Not have the disadvantage of needing to resize the array - with an array, you typically mod the hash value with the array size and resize the array if it gets full, reinserting all elements, but with a BST, you can just directly insert the unchanging hash value into the BST.

    This might be relevant if we want any individual operation to never take more than a certain amount of time (which could very well happen if we need to resize the array), with the overall performance being secondary, but there might be better ways to solve this problem.

  • Have a reduced risk of hash collisions since you don't mod with the array size and thus the number of possible hashes could be significantly bigger. This would reduce the risk of getting the worst-case performance of a hash table (which is when a significant portion of the elements hash to the same value).

    What the actual worst-case performance is would depend on how you're resolving collisions. This is typically done with linked-lists for O(n) worst case performance. But we can also achieve O(log n) performance with BST's (as is done in Java's hash table implementation if the number of elements with some hash are above a threshold) - that is, have your hash table array where each element points to a BST where all elements have the same hash value.

  • Possibly use less memory - with an array you'd inevitably have some empty indices, but with a BST, these simply won't need to exist. Although this is not a clear-cut advantage, if it's an advantage at all.

    If we assume we use the less common array-based BST implementation, this array will also have some empty indices and this would also require the occasional resizing, but this is a simply memory copy as opposed to needing to reinsert all elements with updated hashes.

    If we use the typical pointer-based BST implementation, the added cost for the pointers would seemingly outweigh the cost of having a few empty indices in an array (unless the array is particularly sparse, which tends to be a bad sign for a hash table anyway).

But, since I haven't personally ever heard of this ever being done, presumably the benefits are not worth the increased cost of operations from expected O(1) to O(log n).

Typically the choice is indeed between using a BST directly (without hash values) and using a hash table (with an array).

like image 97
Bernhard Barker Avatar answered Oct 08 '22 14:10

Bernhard Barker