Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128? I really don't understand what's the problem with the power of 2 numbers. Or how it actually makes any difference at all.
When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of 2, since if m = 2^p , then h(k) is just the p lowest-order bits of k.
Let's suppose the possible elements are only between 1 and 10000 and I picked the table size as 128. How can 127 be better? So 128 is 2^6 (1000000) and 127 is 0111111. What difference does this make? All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. Did I get something wrong?
I'm looking for some examples as I really can't understand why is this bad. Thanks a lot in advance!
PS: I am aware of: Hash table: why size should be prime?
They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.
But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.
The Hashing has two parts: hash function and compression function. Changing the size of the hash table will change the compression function hence leading to the keys being allotted to different buckets.
To restrict the load balance so that it does not get too large (slow search, insert, delete) or too small (waste of memory), we will increase the size of the hash table if it gets too full and decrease the size of the hash table if it gets too empty.
All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too.
That is wrong (or I misunderstood..). k % 127
depends on all bits of k. k % 128
only depends on the 7 lowest bits.
EDIT:
If you have a perfect distribution between 1 and 10,000. 10,000 % 127
and 10,000 % 128
both will turn this in a excellent smaller distribution. All buckets will contain 10,000 /128 = 78 (or 79) items.
If you have a distribution between 1 and 10,000 that is biased, because {x, 2x, 3x, ..} occur more often. Then a prime size will give a much, much better distribution as explained in this answer. (Unless x is exactly that prime size.)
Thus, cutting off the high bits (using a size of 128) is no problem whatsoever if the distribution in the lower bits is good enough. But, with real data and real badly designed hash functions, you will need those high bits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With