Hash table - implementing with Binary Search Tree

Tags:

From Cracking the Coding Interview, page 71:

Alternatively, we can implement hash table with a BST. We can then guarantee an O(log n) lookup time, since we can keep the tree balanced. Additionally we may use less space, since a large array no longer needs to be allocated in the very beginning.

I know the basics of linked lists, hash tables and BSTs, but I am unable to understand these lines. What does it actually mean? Would this final data structure would be a Trie?

323

asked Jan 02 '15 08:01

Divyanshu Jimmy

2 Answers

The full text of that section states, with the last paragraph being the one you asked about:

A hash table is a data structure that maps keys to values for highly efficient lookup. In a very simple implementation of a hash table, the hash table has an underlying array and a hash function. When you want to insert an object and its key, the hash function maps the key to an integer, which indicates the index in the array. The object is then stored at that index.

Typically, though, this won't quite work right. In the above implementation, the hash value of all possible keys must be unique, or we might accidentally overwrite data. The array would have to be extremely large—the size of all possible keys—to prevent such "collisions."

Instead of making an extremely large array and storing objects at index hash (key), we can make the array much smaller and store objects in a linked list at index hash (key) % array_length.To get the object with a particular key, we must search the linked list for this key.

Alternatively, we can implement the hash table with a binary search tree. We can then guarantee an 0(log n) lookup time, since we can keep the tree balanced. Additionally, we may use less space, since a large array no longer needs to be allocated in the very beginning.

So they're talking about using a BST (binary search tree) to handle collisions. It wouldn't actually make sense to use a BST as the sole data structure since the whole point of a properly tuned hash is that look-up is on the order of O(1), much better than the O(log n) from a BST. On top of that, using a BST to totally implement a hash table means it's not actually a hash table :-)

However, consider that, when you have collisions in a hash table, a frequent way to handle them is to have each bucket contain a linked list of its items. In the degenerate case (all items hashing to the same bucket), you end up with just a linked list and the O(1) turns into O(n).

So, rather than a linked list at each bucket, you have a BST. Then you no longer have O(n) search complexity in cases where a single bucket has many items (the previously mentioned collisions).

You use the hash function to find the bucket in O(1) then search through the BST in O(log n) if there are collisions. In the best case (one item per bucket), it's still O(1). The worst case then becomes O(log n) rather than O(n).

The only thing that originally concerned me about that theory is that they also discuss the fact that a large allocation is no longer necessary. If it's a shared hash/BST combination, you still need to allocate the entire hash table so that seemed incongruous.

However, from the context ("... since a large array no longer needs to be allocated ..."), it appears that they mean they can make the hash table part of the dual data structure smaller as the collisions are more efficient to process. In other words, rather than a 1000-element hash table with linked lists for collisions, you can get away with a 100-element hash table because the collisions are not so damaging to the search time if you use a BST.

136

answered Sep 23 '22 21:09

paxdiablo

You're conflating a few terms here.

The idea would be to implement the hash table with both the array and a BST in a two-tiered fashion. One would still add values into the hash if there were no collision, but if there was, then one could solve the performance of retrieving a collided element with the BST.
A trie is something entirely different; depending on what you were attempting to store, you might not be able to apply it to a hashing function.

answered Sep 25 '22 21:09

Makoto

Related questions
                            
                                Pairing numbers (a,b) in an array such a way that a*2 >=b
                            
                                How do I sort a CArray of a user defined type?
                            
                                is it possible to add a list to a structure?
                            
                                Is there a bidirectional multimap persistent data structure?
                            
                                which c# collection to use instead of List<KeyValuePair<string, double>>?
                            
                                linked list reverse without temp
                            
                                Vertical sum of a binary tree [closed]
                            
                                Data structures equivalents of STL containers
                            
                                500,000 street names - what data structure and to use to implement a fast search?
                            
                                Avoid struct padding in C++
                            
                                Immutable Hash and Array implementation in JavaScript?
                            
                                Modify Dijkstra's Algorithm to get the Shortest Path Between Two Nodes
                            
                                Time complexity of level order traversal
                            
                                Queues in Java allows removal of random element. is this bad?
                            
                                what data structure to use for multidimensional mesh grid? (c++)
                            
                                Linked-list in C++ using references instead of pointers
                            
                                What's the name of this array data structure?
                            
                                struct has no member named
                            
                                Heapsort: why not use "Soft Heap" to boost the performance?
                            
                                How many permutations of a given array result in BST's of height 2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hash table - implementing with Binary Search Tree

Tags:

hashtable

data-structures

binary-search-tree

Divyanshu Jimmy

People also ask

2 Answers

paxdiablo

Makoto

Recent Activity

Donate For Us