Why do we use prime numbers in hash functions?

This means we have to choose a prime number that doesn’t divide our keys, choosing a large prime number is usually enough. So erring on the side of being repetitive the reason prime numbers are used is to neutralize the effect of patterns in the keys in the distribution of collisions of a hash function.

Why is 31 a good number for a hash table?

There is perhaps a couple of reasons for choosing 31. The main reason is that it is a prime number and prime numbers have better distribution results in hashing algorithms, by other words the hashing outputs have less collisions for different inputs.

What is the probability of hash table size that is prime?

TL;DR: a hash table size that is a prime number is the ONLY way to guarantee that you do not accidentally re-probe a previously probed location. If your hash function is of the form h ( k) = a × k mod m where m is prime and a is chosen at random, then the probability that 2 distinct keys hash to the same bucket is 1 m.

Why do we use prime numbers for randomization?

The reason why prime numbers are used is to minimize collisions when the data exhibits some particular patterns. First things first: If the data is random then there’s no need for a prime number, you can do a mod operation against any number and you will have the same number of collisions for each possible value of the modulus.

Why use a prime number in hashCode?

People also ask

Why should hash tables be prime?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.

Why does Java use 31 in the hashCode () for string?

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

Why is hashCode () used?

hashCode in Java helps the program to run faster. For example, comparing two objects by their hashcodes will give the result 20 times faster than comparing them using the equals() function. This is so because hash data structures like HashMaps, internally organize the elements in an array-based data structure.

Why the length of an array table size should be a prime number when the array is used to implement a hash table with open addressing and double hashing?

Doubling the size is done to minimize collision, which occurs when multiple keys map to the same bucket. Fewer collisions provide faster search time.

Prime numbers are chosen to best distribute data among hash buckets. If the distribution of inputs is random and evenly spread, then the choice of the hash code/modulus does not matter. It only has an impact when there is a certain pattern to the inputs.

This is often the case when dealing with memory locations. For example, all 32-bit integers are aligned to addresses divisible by 4. Check out the table below to visualize the effects of using a prime vs. non-prime modulus:

Click to copy

Input       Modulo 8    Modulo 7
0           0           0
4           4           4
8           0           1
12          4           5
16          0           2
20          4           6
24          0           3
28          4           0

Notice the almost-perfect distribution when using a prime modulus vs. a non-prime modulus.

However, although the above example is largely contrived, the general principle is that when dealing with a pattern of inputs, using a prime number modulus will yield the best distribution.

Because you want the number you are multiplying by and the number of buckets you are inserting into to have orthogonal prime factorizations.

Suppose there are 8 buckets to insert into. If the number you are using to multiply by is some multiple of 8, then the bucket inserted into will only be determined by the least significant entry (the one not multiplied at all). Similar entries will collide. Not good for a hash function.

31 is a large enough prime that the number of buckets is unlikely to be divisible by it (and in fact, modern java HashMap implementations keep the number of buckets to a power of 2).

For what it's worth, Effective Java 2nd Edition hand-waives around the mathematics issue and just say that the reason to choose 31 is:

Because it's an odd prime, and it's "traditional" to use primes
It's also one less than a power of two, which permits for bitwise optimization

Here's the full quote, from Item 9: Always override hashCode when you override equals:

The value 31 was chosen because it's an odd prime. If it were even and multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

A nice property of 31 is that the multiplication can be replaced by a shift (§15.19) and subtraction for better performance:

Click to copy
 31 * i == (i << 5) - i
Modern VMs do this sort of optimization automatically.

While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and theoretical computer scientists.

Perhaps a later release of the platform will provide state-of-the-art hash functions for its classes and utility methods to allow average programmers to construct such hash functions. In the meantime, the techniques described in this item should be adequate for most applications.

Rather simplistically, it can be said that using a multiplier with numerous divisors will result in more hash collisions. Since for effective hashing we want to minimize the number of collisions, we try to use a multiplier that has fewer divisors. A prime number by definition has exactly two distinct, positive divisors.

Related questions

Java hashCode from one field - the recipe, plus example of using Apache Commons Lang's builders
is it incorrect to define an hashcode of an object as the sum, multiplication, whatever, of all class variables hashcodes?
Absolute Beginner's Guide to Bit Shifting?

I heard that 31 was chosen so that the compiler can optimize the multiplication to left-shift 5 bits then subtract the value.

Here's a citation a little closer to the source.

It boils down to:

31 is prime, which reduces collisions
31 produces a good distribution, with
a reasonable tradeoff in speed

Related questions
                            
                                Java Reflection Performance
                            
                                How to allow all Network connection types HTTP and HTTPS in Android (9) Pie?
                            
                                How to write a UTF-8 file with Java?
                            
                                Unable to resolve host "<URL here>" No address associated with host name [closed]
                            
                                What does mvn install in maven exactly do
                            
                                How can I hash a password in Java?
                            
                                Why can outer Java classes access inner class private members?
                            
                                Good reasons to prohibit inheritance in Java?
                            
                                Java 8: Lambda-Streams, Filter by Method with Exception
                            
                                Why explicitly throw a NullPointerException rather than letting it happen naturally?
                            
                                What is lazy loading in Hibernate?
                            
                                How to add custom method to Spring Data JPA
                            
                                Converting an int to a binary string representation in Java?
                            
                                Gradle proxy configuration
                            
                                Java 8 LocalDate Jackson format
                            
                                Fast Bitmap Blur For Android SDK
                            
                                Unsure if I understand TransactionAwarePersistenceManagerFactoryProxy
                            
                                What is "pom" packaging in maven?
                            
                                Why is 128==128 false but 127==127 is true when comparing Integer wrappers in Java?
                            
                                Gradle: Could not determine java version from '11.0.2'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why use a prime number in hashCode?

Tags:

java

hashcode

primes

People also ask

Related questions

Recent Activity

Donate For Us