when to resize a hash table?

Tags:

In various hash table implementations, I have seen "magic numbers" for when a mutable hash table should resize (grow). Usually this number is somewhere between 65% to 80% of the values added per allocated slots. I am assuming the trade off is that a higher number will give the potential for more collisions and a lower number less at the expense of using more memory.

My question is how is this number arrived at?

Is it arbitrary? based on testing? based on some other logic?

380

asked Feb 10 '11 16:02

Nick Van Brunt

2 Answers

At a guess, most people at least start from the numbers in a book (e.g., Knuth, Volume 3), which were produced by testing. Depending on the situation, some may carry out testing afterwards, and make adjustments accordingly -- but from what I've seen, these are probably in the minority.

As I outlined in a previous answer, the "right" number also depends heavily on how you resolve collisions. For better or worse, this fact seems to be widely ignored -- people frequently don't pick numbers that are particularly appropriate for the collision resolution they use.

OTOH, the other point I found in my testing is that it only rarely makes a whole lot of difference. You can pick numbers across a fairly broad range and get pretty similar overall speed. The main thing is to be careful to avoid pushing the number too high, especially if you're using something like linear probing for collision resolution.

157

answered Oct 10 '22 17:10

Jerry Coffin

That depends on the keys. If you know that your hash function is perfect for all possible keys (for example, using gperf), then you know that you'll have only few collisions, so the number is higher.

But most of the time, you don't know much about the keys except that they are text. In this case, you have to guess since you don't even have test data to figure out in advance how your hash function is behaving.

So you hope for the best. If you hash function is very bad for the keys, then you will have a lot of collisions and the point of growth will never be reached. In this case, the chosen figure is irrelevant.

If your hash function is adequate, then it should create only a few collisions (less than 50%), so a number between 65% and 80% seems reasonable.

That said: Unless your hash table must be perfect (= huge size or lots of accesses), don't bother. If you have, say, ten elements, considering these issues is a waste of time.

answered Oct 10 '22 15:10

Aaron Digulla

Related questions
                            
                                A clever homebrew modulus implementation
                            
                                displaying axis from min to max value - calculating scale and labels
                            
                                Autofocus routine detecting very small differences in blur
                            
                                Finding the line along the intersection of two planes
                            
                                Numbers ending in 3 have at least one multiple having all ones
                            
                                Find Top 10 Most Frequent visited URl, data is stored across network
                            
                                Embedded youtube video with "autoplay=1". Does it count towards views?
                            
                                Linear indexing in symmetric matrices
                            
                                how to order vertices in a non-convex polygon (how to find one of many solutions)
                            
                                Triangle / Circle enclosing a set of points
                            
                                What is the basic difference between Bellman-ford and Floyd warshall algorithm?
                            
                                How to improve efficiency of algorithm which generates next lexicographic permutation?
                            
                                Efficient way to filter out elements from std::vector
                            
                                How can I return an array of struct in solidity?
                            
                                Synchronisation algorithms
                            
                                small cycle finding in a planar graph
                            
                                Is there such a thing as "negative" big-O complexity? [duplicate]
                            
                                Should an octree be rebuilt every frame?
                            
                                How to find same-value rectangular areas of a given size in a matrix most efficiently?
                            
                                Powerful algorithms too complex to implement [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

when to resize a hash table?

Tags:

algorithm

hashtable

Nick Van Brunt

People also ask

2 Answers

Jerry Coffin

Aaron Digulla

Recent Activity

Donate For Us