Why are hash table expansions usually done by doubling the size?

Tags:

I've done a little research on hash tables, and I keep running across the rule of thumb that when there are a certain number of entries (either max or via a load factor like 75%) the hash table should be expanded.

Almost always, the recommendation is to double (or double plus 1, i.e., 2n+1) the size of the hash table. However, I haven't been able to find a good reason for this.

Why double the size, rather than, say, increasing it 25%, or increasing it to the size of the next prime number, or next k prime numbers (e.g., three)?

I already know that it's often a good idea to choose an initial hash table size which is a prime number, at least if your hash function uses modulus such as universal hashing. And I know that's why it's usually recommended to do 2n+1 instead of 2n (e.g., http://www.concentric.net/~Ttwang/tech/hashsize.htm)

However as I said, I haven't seen any real explanation for why doubling or doubling-plus-one is actually a good choice rather than some other method of choosing a size for the new hash table.

(And yes I've read the Wikipedia article on hash tables :) http://en.wikipedia.org/wiki/Hash_table

749

asked Mar 03 '10 07:03

Chirael

1 Answers

Hash-tables could not claim "amortized constant time insertion" if, for instance, the resizing was by a constant increment. In that case the cost of resizing (which grows with the size of the hash-table) would make the cost of one insertion linear in the total number of elements to insert. Because resizing becomes more and more expensive with the size of the table, it has to happen "less and less often" to keep the amortized cost of insertion constant.

Most implementations allow the average bucket occupation to grow to until a bound fixed in advance before resizing (anywhere between 0.5 and 3, which are all acceptable values). With this convention, just after resizing the average bucket occupation becomes half that bound. Resizing by doubling keeps the average bucket occupation in a band of width *2.

Sub-note: because of statistical clustering, you have to take an average bucket occupation as low as 0.5 if you want many buckets to have at most one elements (maximum speed for finding ignoring the complex effects of cache size), or as high as 3 if you want a minimum number of empty buckets (that correspond to wasted space).

answered Sep 28 '22 14:09

Pascal Cuoq

Related questions
                            
                                What is lock-free multithreaded programming?
                            
                                Find number in sorted matrix (Rows n Columns) in O(log n) [duplicate]
                            
                                How to generate n different colors for any natural number n? [duplicate]
                            
                                Logarithm Algorithm
                            
                                Quicksort with 3-way partition
                            
                                How does Radix Sort work?
                            
                                How can std::vector access elements with huge gaps between them?
                            
                                Efficient maths algorithm to calculate intersections
                            
                                Average Runtime of Quickselect
                            
                                longest increasing subsequence(O(nlogn))
                            
                                How to keep track of depth in breadth first search?
                            
                                Finding a single number in a list [duplicate]
                            
                                What is the best way to get the minimum or maximum value from an Array of numbers?
                            
                                How to implement a Median-heap
                            
                                Find maximum possible time HH:MM by permuting four given digits
                            
                                How to check if line segment intersects a rectangle?
                            
                                What is the problem name for Traveling salesman problem(TSP) without considering going back to starting point?
                            
                                Empirically estimating big-oh time efficiency
                            
                                How does Google Docs deal with editing collisions?
                            
                                Given an audio stream, find when a door slams (sound pressure level calculation?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are hash table expansions usually done by doubling the size?

Tags:

algorithm

hashtable

data-structures

hash

Chirael

People also ask

1 Answers

Pascal Cuoq

Recent Activity

Donate For Us