Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the size 127 (prime) better than 128 for a hash-table?

Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128? I really don't understand what's the problem with the power of 2 numbers. Or how it actually makes any difference at all.

When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of 2, since if m = 2^p , then h(k) is just the p lowest-order bits of k.

Let's suppose the possible elements are only between 1 and 10000 and I picked the table size as 128. How can 127 be better? So 128 is 2^6 (1000000) and 127 is 0111111. What difference does this make? All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. Did I get something wrong?

I'm looking for some examples as I really can't understand why is this bad. Thanks a lot in advance!

PS: I am aware of: Hash table: why size should be prime?

like image 814
Clash Avatar asked May 08 '11 19:05

Clash


People also ask

Why is it better when a hash table has size that is a prime number?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.

What is a good size for a hash table?

But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.

What happens when you vary the size of a hash table?

The Hashing has two parts: hash function and compression function. Changing the size of the hash table will change the compression function hence leading to the keys being allotted to different buckets.

Why do we resize hash table?

To restrict the load balance so that it does not get too large (slow search, insert, delete) or too small (waste of memory), we will increase the size of the hash table if it gets too full and decrease the size of the hash table if it gets too empty.


1 Answers

All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too.

That is wrong (or I misunderstood..). k % 127 depends on all bits of k. k % 128 only depends on the 7 lowest bits.


EDIT:

If you have a perfect distribution between 1 and 10,000. 10,000 % 127 and 10,000 % 128 both will turn this in a excellent smaller distribution. All buckets will contain 10,000 /128 = 78 (or 79) items.

If you have a distribution between 1 and 10,000 that is biased, because {x, 2x, 3x, ..} occur more often. Then a prime size will give a much, much better distribution as explained in this answer. (Unless x is exactly that prime size.)

Thus, cutting off the high bits (using a size of 128) is no problem whatsoever if the distribution in the lower bits is good enough. But, with real data and real badly designed hash functions, you will need those high bits.

like image 159
Ishtar Avatar answered Oct 03 '22 22:10

Ishtar