Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to resize a hash table

I am creating my own implementation to hash a table for education purposes.

What would be the best way to increase a hash table size?

I currently double the hash array size.

The hashing function I'm using is: key mod arraysize.

The problem with this is that if the keys are: 2, 4, 6, 8, then the array size will just keep increasing.

What is the best way of overcoming this issue? Is there a better way of increasing a hash table size? Would changing my hashing function help?

NOTE: My keys are all integers!

like image 870
Yahya Uddin Avatar asked Mar 16 '14 13:03

Yahya Uddin


People also ask

How hash tables grow and shrink?

The usable size of a hash table is the number of associations that the table can hold at a given time. If the number of associations in the table exceeds the usable size, the table will automatically grow, increasing the usable size to a new value that is sufficient to hold the associations.

What is the time complexity of resizing a hash table?

Each resizing operation takes O(n) time where n is the size of the hash table being resized. Therefore the O(1) performance of the hash table operations no longer holds in the case of add : its worst-case performance is O(n).

Why is it better when a hash table has size that is a prime number?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.

When should a hash table be expanded?

I've done a little research on hash tables, and I keep running across the rule of thumb that when there are a certain number of entries (either max or via a load factor like 75%) the hash table should be expanded. Almost always, the recommendation is to double (or double plus 1, i.e., 2n+1) the size of the hash table.


1 Answers

Hash tables often avoid this problem by making sure that the hash table size is a prime number. When you resize the table, double the size and then round up to the first prime number larger than that. Doing this avoids the clustering problems similar to what you describe.

Now, it does take a little bit of time to find the next prime number, but not a whole lot. When compared to the time involved in rehashing the hash table's contents, finding the next prime number takes almost no time at all. See Optimizing the wrong thing for a description.

like image 111
Jim Mischel Avatar answered Oct 11 '22 06:10

Jim Mischel