Chained Hash Tables vs. Open-Addressed Hash Tables

2 Answers

Wikipedia's article on hash tables gives a distinctly better explanation and overview of different hash table schemes that people have used than I'm able to off the top of my head. In fact you're probably better off reading that article than asking the question here. :)

That said...

A chained hash table indexes into an array of pointers to the heads of linked lists. Each linked list cell has the key for which it was allocated and the value which was inserted for that key. When you want to look up a particular element from its key, the key's hash is used to work out which linked list to follow, and then that particular list is traversed to find the element that you're after. If more than one key in the hash table has the same hash, then you'll have linked lists with more than one element.

The downside of chained hashing is having to follow pointers in order to search linked lists. The upside is that chained hash tables only get linearly slower as the load factor (the ratio of elements in the hash table to the length of the bucket array) increases, even if it rises above 1.

An open-addressing hash table indexes into an array of pointers to pairs of (key, value). You use the key's hash value to work out which slot in the array to look at first. If more than one key in the hash table has the same hash, then you use some scheme to decide on another slot to look in instead. For example, linear probing is where you look at the next slot after the one chosen, and then the next slot after that, and so on until you either find a slot that matches the key you're looking for, or you hit an empty slot (in which case the key must not be there).

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. It gets very, very slow if the load factor approaches 1, because you end up usually having to search through many of the slots in the bucket array before you find either the key that you were looking for or an empty slot. Also, you can never have more elements in the hash table than there are entries in the bucket array.

To deal with the fact that all hash tables at least get slower (and in some cases actually break completely) when their load factor approaches 1, practical hash table implementations make the bucket array larger (by allocating a new bucket array, and copying elements from the old one into the new one, then freeing the old one) when the load factor gets above a certain value (typically about 0.7).

There are lots of variations on all of the above. Again, please see the wikipedia article, it really is quite good.

For a library that is meant to be used by other people, I would strongly recommend experimenting. Since they're generally quite performance-crucial, you're usually best off using somebody else's implementation of a hash table which has already been carefully tuned. There are lots of open-source BSD, LGPL and GPL licensed hash table implementations.

If you're working with GTK, for example, then you'll find that there's a good hash table in GLib.

178

answered Nov 07 '22 22:11

Richard Barrell

My understanding (in simple terms) is that both the methods has pros and cons, though most of the libraries use Chaining strategy.

Chaining Method:

Here the hash tables array maps to a linked list of items. This is efficient if the number of collision is fairly small. The worst case scenario is O(n) where n is the number of elements in the table.

Open Addressing with Linear Probe:

Here when the collision occurs, move on to the next index until we find an open spot. So, if the number of collision is low, this is very fast and space efficient. The limitation here is the total number of entries in the table is limited by the size of the array. This is not the case with chaining.

There is another approach which is Chaining with binary search trees. In this approach, when the collision occurs, they are stored in binary search tree instead of linked list. Hence, the worst case scenario here would be O(log n). In practice, this approach is best suited when there is a extremely nonuniform distribution.

answered Nov 07 '22 22:11

Yuvi

Related questions
                            
                                Why exactly do we need a "Circular Linked List" (singly or doubly) data structure?
                            
                                Computing an md5 hash of a data structure
                            
                                List of all unique characters in a string?
                            
                                Stack and Queue, Why?
                            
                                C - How to implement Set data structure?
                            
                                Is there a "Set" data structure in .Net?
                            
                                What is the best way to create a sparse array in C++?
                            
                                Learning Algorithms and Data Structures Fundamentals [closed]
                            
                                How does a sentinel node offer benefits over NULL?
                            
                                what is "stack alignment"?
                            
                                What is the true difference between a dictionary and a hash table?
                            
                                Converting YAML file to python dict
                            
                                Is Minimum Spanning Tree afraid of negative weights?
                            
                                How does 3D collision / object detection work?
                            
                                How to delete in a heap data structure?
                            
                                Check if a binary tree is a mirror image or symmetric
                            
                                Good algorithm and data structure for looking up words with missing letters?
                            
                                Visual C++ equivalent of GCC's __attribute__ ((__packed__))
                            
                                Repository pattern vs. "smart" business objects [closed]
                            
                                Is it possible to query a tree structure table in MySQL in a single query, to any depth?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Chained Hash Tables vs. Open-Addressed Hash Tables

Tags:

hashtable

data-structures

Andrei Ciobanu

People also ask

2 Answers

Richard Barrell

Yuvi

Recent Activity

Donate For Us