Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128? I really don't understand what's the problem with the power of 2 numbers. Or how it actually makes any difference at all. <blockquote> When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of 2, since if m = 2^p , then h(k) is just the p lowest-order bits of k. </blockquote> Let's suppose the possible elements are only between 1 and 10000 and I picked the table size as 128. How can 127 be better? So 128 is 2^6 (1000000) and 127 is 0111111. What difference does this make? All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. Did I get something wrong? I'm looking for some examples as I really can't understand why is this bad. Thanks a lot in advance! PS: I am aware of: Hash table: why size should be prime?

<blockquote> All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. </blockquote> That is wrong (or I misunderstood..). <code>k % 127</code> depends on all bits of k. <code>k % 128</code> only depends on the 7 lowest bits. <hr> EDIT: If you have a perfect distribution between 1 and 10,000. <code>10,000 % 127</code> and <code>10,000 % 128</code> both will turn this in a excellent smaller distribution. All buckets will contain 10,000 /128 = 78 (or 79) items. If you have a distribution between 1 and 10,000 that is biased, because {x, 2x, 3x, ..} occur more often. Then a prime size will give a much, much better distribution as explained in this answer. (Unless x is exactly that prime size.) Thus, cutting off the high bits (using a size of 128) is no problem whatsoever if the distribution in the lower bits is good enough. But, with real data and real badly designed hash functions, you will need those high bits.

Why is the size 127 (prime) better than 128 for a hash-table?

Tags:

algorithm

hash

primes

Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128? I really don't understand what's the problem with the power of 2 numbers. Or how it actually makes any difference at all.

When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of 2, since if m = 2^p , then h(k) is just the p lowest-order bits of k.

Let's suppose the possible elements are only between 1 and 10000 and I picked the table size as 128. How can 127 be better? So 128 is 2^6 (1000000) and 127 is 0111111. What difference does this make? All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. Did I get something wrong?

I'm looking for some examples as I really can't understand why is this bad. Thanks a lot in advance!

PS: I am aware of: Hash table: why size should be prime?

814

asked May 08 '11 19:05

Clash

1 Answers

All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too.

That is wrong (or I misunderstood..). k % 127 depends on all bits of k. k % 128 only depends on the 7 lowest bits.

EDIT:

If you have a perfect distribution between 1 and 10,000. 10,000 % 127 and 10,000 % 128 both will turn this in a excellent smaller distribution. All buckets will contain 10,000 /128 = 78 (or 79) items.

If you have a distribution between 1 and 10,000 that is biased, because {x, 2x, 3x, ..} occur more often. Then a prime size will give a much, much better distribution as explained in this answer. (Unless x is exactly that prime size.)

Thus, cutting off the high bits (using a size of 128) is no problem whatsoever if the distribution in the lower bits is good enough. But, with real data and real badly designed hash functions, you will need those high bits.

159

answered Oct 03 '22 22:10

Ishtar

Related questions
                            
                                What is a bubble sort good for? [closed]
                            
                                What is the difference between Linear search and Binary search?
                            
                                Good algorithm for finding the diameter of a (sparse) graph?
                            
                                How do I find the median of numbers in linear time using heaps?
                            
                                Efficient way to search a stream for a string
                            
                                Algorithm challenge: Generate color scheme from an image
                            
                                Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)
                            
                                Mastering Recursive Programming [closed]
                            
                                Difference between AVL trees and splay trees
                            
                                Given two arrays, find the permutations that give closest distance between two arrays
                            
                                How do you find a point at a given perpendicular distance from a line?
                            
                                When should we use Radix sort?
                            
                                Coupon code generation
                            
                                Why is the constant always dropped from big O analysis?
                            
                                What is the best way to find all combinations of items in an array?
                            
                                Division without using '/'
                            
                                How do I efficiently determine if a polygon is convex, non-convex or complex?
                            
                                math/algorithm Fit image to screen retain aspect ratio
                            
                                fast algorithm for drawing filled circles?
                            
                                Find 2 numbers in an unsorted array equal to a given sum

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With