I'm fairly new to the the concept of hash tables, and I've been reading up on different types of hash table lookup and insertion techniques.
I'm wondering what the difference is between the time complexities of linear probing, chaining, and quadratic probing?
I'm mainly interested in the the insertion, deletion, and search of nodes in the hash table. So if I graph the system time per process ( insert/search/delete process ) versus the process number, how would the graphs differ?
I'm guessing that: - quadratic probing would be worst-case O(nlogn) or O(logn) for searching - linear probing would be worst-case O(n) for search - Not sure but I think O(n^2) for chaining
Could someone confirm this? Thanks!
It's actually surprisingly difficult to accurately analyze all of these different hashing schemes for a variety of reasons. First, unless you make very strong assumptions on your hash function, it is difficult to analyze the behavior of these hashing schemes accurately, because different types of hash functions can lead to different performances. Second, the interactions with processor caches mean that certain types of hash tables that are slightly worse in theory can actually outperform hash tables that are slightly better in theory because their access patterns are better.
If you assume that your hash function looks like a truly random function, and if you keep the load factor in the hash table to be at most a constant, all of these hashing schemes have expected O(1) lookup times. In other words, each scheme, on expectation, only requires you to do a constant number of lookups to find any particular element.
In theory, linear probing is a bit worse than quadratic hashing and chained hashing, because elements tend to cluster near one another unless the hash function has strong theoretical properties. However, in practice it can be extremely fast because of locality of reference: all of the lookups tend to be close to one another, so fewer cache misses occur. Quadratic probing has fewer collisions, but doesn't have as good locality. Chained hashing tends to have extremely few collisions, but tends to have poorer locality of reference because the chained elements are often not stored contiguously.
In the worst case, all of these data structures can take O(n) time to do a lookup. It's extremely unlikely for this to occur. In linear probing, this would require all the elements to be stored continuously with no gaps and you would have to look up one of the first elements. With quadratic hashing, you'd have to have a very strange looking set of buckets in order to get this behavior. With chained hashing, your hash function would have to dump every single element into the same bucket to get the absolute worst-case behavior. All of these are exponentially unlikely.
In short, if you pick any of these data structures, you are unlikely to get seriously burned unless you have a very bad hash function. I would suggest using chained hashing as a default since it has good performance and doesn't hit worst-case behavior often. If you know you have a good hash function, or have a small hash table, then linear probing might be a good option.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With