I recently learned about different methods to deal with collisions in hash tables and saw that the separate chaining with linked lists is always more time efficient than linear probing. For space efficiency, we allocate a predefined memory for linear probing which later on we might not use, but for separate chaining we use memory dynamically.
Is separate chaining with linked list more efficient than linear probing? If so, why do we then use linear probing at all?
This is because the memory addresses used for the single list are closer together, while separate chaining can have each data structure in different locations far apart from each other. One other advantage can be the ease of access and use of the data.
At about a load factor of 0.8, chaining starts to become more efficient due to multiple collisions: you would have to probe a lot of empty cells in order to find the actual value you want with probing, while with chaining you have a list of values that have the same hash key.
A hash collision occurs when the hash function maps a key into a cell that is already occupied by a different key. Linear probing is a strategy for resolving collisions, by placing the new key into the closest following empty cell.
I'm surprised that you saw chained hashing to be faster than linear probing - in practice, linear probing is typically significantly faster than chaining. This is primarily due to locality of reference, since the accesses performed in linear probing tend to be closer in memory than the accesses performed in chained hashing.
There are other wins in linear probing. For example, insertions into a linear probing hash table don't require any new allocations (unless you're rehashing the table), so in applications like network routers where memory is scarce, it's nice to know that once the table is set up, the elements can be placed into it with no risk of a malloc
fail.
One weakness of linear probing is that, with a bad choice of hash function, primary clustering can cause the performance of the table to degrade significantly. While chained hashing can still suffer from bad hash functions, it's less sensitive to elements with nearby hash codes, which don't adversely impact the runtime. Theoretically, linear probing only gives expected O(1) lookups if the hash functions are 5-independent or if there's sufficient entropy in the keys. There are many ways to address this, since as using the Robin Hood hashing technique or hopscotch hashing, both of which have significantly better worst-cases than vanilla linear probing.
The other weakness of linear probing is that its performance significantly degrades as the load factor approaches 1. You can address this either by rehashing periodically or by using the Robin Hood hashing technique described above.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With