I was going through Eric Lippert's latest Blog post for Guidelines and rules for GetHashCode when i hit this para: <blockquote> We could be even more clever here; just as a List resizes itself when it gets full, the bucket set could resize itself as well, to ensure that the average bucket length stays low. Also, for technical reasons it is often a good idea to make the bucket set length a prime number, rather than 100. There are plenty of improvements we could make to this hash table. But this quick sketch of a naive implementation of a hash table will do for now. I want to keep it simple. </blockquote> So looks like i'm missing something. Why is it a good practice to set it to a prime number?.

You can find people that suggest the two opposite ends of the spectrum. On the one side, choosing a prime number for the size of the hash table will reduce the chances of collisions, even if the hash function is not too effective distributing the results. Note that if (in the simplest example to argue about) a power of 2 size is decided, only the lower bits affect the bucket, while for a prime number most bits in the result of the hash will be used. On the other hand, you can gain more by choosing a better hash function, or even rehashing he result of the hash function by applying some bit operations, and using a power of 2 hash size to speed up calculations. As an example from real life, Java HashTable were initially implemented by using prime (or almost prime sizes), but from Java 1.4 on, the design was changed to use power of two number of buckets and added a second fast hash function applied to the result of the initial hash. An interesting article commenting that change can be found here. So basically: <ul> <li>a prime number helps dispersing the inputs across the different buckets even in the event of not-so-good hash functions.</li> <li>a similar effect can be achieved by post processing the result of the hash function, and using a power of 2 size to speedup the modulo operation (bit mask) and compensate for the post processing. </li> </ul>

Why setting HashTable's length to a Prime Number is a good practice?

Tags:

I was going through Eric Lippert's latest Blog post for Guidelines and rules for GetHashCode when i hit this para:

We could be even more clever here; just as a List resizes itself when it gets full, the bucket set could resize itself as well, to ensure that the average bucket length stays low. Also, for technical reasons it is often a good idea to make the bucket set length a prime number, rather than 100. There are plenty of improvements we could make to this hash table. But this quick sketch of a naive implementation of a hash table will do for now. I want to keep it simple.

So looks like i'm missing something. Why is it a good practice to set it to a prime number?.

542

asked Mar 01 '11 08:03

Shekhar_Pro

1 Answers

You can find people that suggest the two opposite ends of the spectrum. On the one side, choosing a prime number for the size of the hash table will reduce the chances of collisions, even if the hash function is not too effective distributing the results. Note that if (in the simplest example to argue about) a power of 2 size is decided, only the lower bits affect the bucket, while for a prime number most bits in the result of the hash will be used.

On the other hand, you can gain more by choosing a better hash function, or even rehashing he result of the hash function by applying some bit operations, and using a power of 2 hash size to speed up calculations.

As an example from real life, Java HashTable were initially implemented by using prime (or almost prime sizes), but from Java 1.4 on, the design was changed to use power of two number of buckets and added a second fast hash function applied to the result of the initial hash. An interesting article commenting that change can be found here.

So basically:

a prime number helps dispersing the inputs across the different buckets even in the event of not-so-good hash functions.
a similar effect can be achieved by post processing the result of the hash function, and using a power of 2 size to speedup the modulo operation (bit mask) and compensate for the post processing.

169

answered Oct 22 '22 00:10

David Rodríguez - dribeas

Related questions
                            
                                Debugging infinite loops in Haskell programs with GHCi
                            
                                How do I run an Eclipse launcher file in IntelliJ IDEA?
                            
                                A cleaner way to select by multiple possible attribute values?
                            
                                publishProgress from inside a function in doInBackground?
                            
                                ant junit task does not report detail
                            
                                how to add labels to a plot
                            
                                How can I make C++0x and __STRICT_ANSI__ get along?
                            
                                android child view ignore parent padding
                            
                                Android mkdir not making folder
                            
                                Thread-safe implementation of max
                            
                                app.config Transformations
                            
                                Scope of #define preprocessor in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With