Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: A "prime" number or a "power of two" as HashMap size?

Many books and tutorials say that the size of a hash table must be a prime to evenly distribute the keys in all the buckets. But Java's HashMap always uses a size that is a power of two. Shouldn't it be using a prime? What's better, a "prime" or a "power of two" as the hash table size?

like image 980
Nikunj Banka Avatar asked Mar 15 '13 16:03

Nikunj Banka


People also ask

Why HashMap capacity is power of 2?

So, basically the point is, if the size is a power of two, the keys will be more evenly distributed across the array with minimal collision leading to better retrieval performance (and also less synchronizations in case of ConcurrentHashMap ) when compared with any other size which is not a power of 2.

Does hashing use prime numbers?

Primes are used because you have good chances of obtaining a unique value for a typical hash-function which uses polynomials modulo P. Say, you use such hash-function for strings of length <= N, and you have a collision. That means that 2 different polynomials produce the same value modulo P.

How HashMap increases its size in Java?

As the number of elements in the HashMap increases, the capacity is expanded. The load factor is the measure that decides when to increase the capacity of the Map. The default load factor is 75% of the capacity. The threshold of a HashMap is approximately the product of current capacity and load factor.

Why is prime number preferred as the size of hash table?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.


2 Answers

Using a power of two effectively masks out top bits of the hash code. Thus a poor-quality hash function might perform particularly badly in this scenario.

Java's HashMap mitigates this by mistrusting the object's hashCode() implementation and applying a second level of hashing to its result:

Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.

If you have a good hash function, or do something similar to what HashMap does, it does not matter whether you use prime numbers etc as the table size.

If, on the other hand, the hash function is of unknown or poor quality, then using a prime number would be a safer bet. It will, however, make dynamically-sized tables tricker to implement, since all of a sudden you need to be able to produce prime numbers instead of just multiplying the size by a constant factor.

like image 93
NPE Avatar answered Sep 20 '22 03:09

NPE


The standard HashMap implementation has a hash method which rehashes your object's hashcode to avoid that pitfall. The comment before the hash() method reads:

/**  * Retrieve object hash code and applies a supplemental hash function to the  * result hash, which defends against poor quality hash functions.  This is  * critical because HashMap uses power-of-two length hash tables, that  * otherwise encounter collisions for hashCodes that do not differ  * in lower bits. Note: Null keys always map to hash 0, thus index 0.  */ 
like image 20
assylias Avatar answered Sep 17 '22 03:09

assylias