Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does HashMap require that the initial capacity be a power of two?

I was going through Java's HashMap source code when I saw the following

//The default initial capacity - MUST be a power of two. static final int DEFAULT_INITIAL_CAPACITY = 16; 

My question is why does this requirement exists in the first place? I also see that the constructor which allows creating a HashMap with a custom capacity converts it into a power of two:

int capacity = 1; while (capacity < initialCapacity)   capacity <<= 1; 

Why does the capacity always has to be a power of two?

Also, when automatic rehashing is performed, what exactly happens? Is the hash function altered too?

like image 656
Sushant Avatar asked Dec 02 '11 06:12

Sushant


People also ask

Why HashMap capacity is power of 2?

So, basically the point is, if the size is a power of two, the keys will be more evenly distributed across the array with minimal collision leading to better retrieval performance (and also less synchronizations in case of ConcurrentHashMap ) when compared with any other size which is not a power of 2.

Why initial capacity of HashMap is 16?

Initial Capacity of HashMap The initial capacity of the HashMap is the number of buckets in the hash table. It creates when we create the object of HashMap class. The initial capacity of the HashMap is 24, i.e., 16. The capacity of the HashMap is doubled each time it reaches the threshold.

What is the initial capacity of HashMap?

Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).

What is the purpose of the initial capacity and load factor parameters of a HashMap what are their default values?

The initial capacity is the capacity at the time the Map is created. Finally, the default initial capacity of the HashMap is 16. As the number of elements in the HashMap increases, the capacity is expanded. The load factor is the measure that decides when to increase the capacity of the Map.


2 Answers

The map has to work out which internal table index to use for any given key, mapping any int value (could be negative) to a value in the range [0, table.length). When table.length is a power of two, that can be done really cheaply - and is, in indexFor:

static int indexFor(int h, int length) {     return h & (length-1); } 

With a different table length, you'd need to compute a remainder and make sure it's non-negative . This is definitely a micro-optimization, but probably a valid one :)

Also, when automatic rehashing is performed, what exactly happens? Is the hash function altered too?

It's not quite clear to me what you mean. The same hash codes are used (because they're just computed by calling hashCode on each key) but they'll be distributed differently within the table due to the table length changing. For example, when the table length is 16, hash codes of 5 and 21 both end up being stored in table entry 5. When the table length increases to 32, they will be in different entries.

like image 115
Jon Skeet Avatar answered Sep 25 '22 06:09

Jon Skeet


The ideal situation is actually using prime number sizes for the backing array of an HashMap. That way your keys will be more naturally distributed across the array. However this works with mod division and that operation became slower and slower with every release of Java. In a sense, the power of 2 approach is the worst table size you can imagine because with poor hashcode implementations are more likely to produce key collosions in the array.

Therefor you'll find another very important method in Java's HashMap implementation, which is the hash(int), that compensates for poor hashcodes.

like image 25
M Platvoet Avatar answered Sep 23 '22 06:09

M Platvoet