Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Are HashMaps Implemented Using Powers of Two?

Tags:

java

hashmap

My question is why hashmap bucket size is power of 2 and I have gone through many answers on stackoverflow but i am still not convinced. Reasons are:

  1. I have read that having capacity as power of 2 makes & operation more efficient to calculate index so my question how exactly it is useful here. I can have size which might be power of 3 , I can still perform & operation like this (hash)&(length-1) so why exactly it should be power of 2 ?

  2. Also if the capacity is not power of 2 why i need to do remainder operation?

like image 420
vikas tiwari Avatar asked Nov 28 '18 19:11

vikas tiwari


People also ask

Why do Hashmaps have O 1?

Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using.

What is the purpose of Hashmaps?

HashMap is a data structure that uses a hash function to map identifying values, known as keys, to their associated values. It contains “key-value” pairs and allows retrieving value by key.

How are Hashmaps organized?

A HashMap has number of buckets (implemented as an array) in which to store entries. When an item is added to the map, it is assigned to a buckets based on a value derived of its hashCode and the bucket size of the HashMap . (Note that it's possible that the bucket is already occupied, which is called a collision.

Why are Hashmaps used in Java?

The HashMap class of the Java collections framework provides the functionality of the hash table data structure. It stores elements in key/value pairs. Here, keys are unique identifiers used to associate each value on a map. The HashMap class implements the Map interface.


2 Answers

When you subtract 1 from a number which is a power of 2, what you get is a number whose binary representation is all 1. E.g. 16 is a power of 2. If you subtract 1 from it, you get 15, whose binary representation is 1111. Now, if you do a bitwise AND of any number with 1111, you're going to get the last 4 bits of the number which, in other words, is equivalent to the modulo of the number by 16 (Division operation is usually an expensive operation. Hence, bitwise operation is usually preferred over division). These last 4 bits will evaluate to any number from 0 to 15 which are the indexes of your underlying array.

You could make the size 17 instead. In that case, after subtracting 1 from it, you'd get 16 which is 10000 in binary. Now you do a bit wise AND of a number with 16, you'll lose all bits of the number except the 5th bit from the end. So, regardless of the number you take, the array index will be either 16 or 0. That means you'd have lot of collisions, which in turn means poor performance. Instead of O(1) for retrieval, you'd need O(log n), because when collision occurs, all the nodes in a given bucket are going to be stored in a red black tree. Not only that. In case you are using a ConcurrentHashMap in a multithreaded environmemt, you'd experience lot of synchronizations, because all the new additions will end up in a very small number of buckets (only two - 0 and 16 in the above case) and when you add new nodes in a bucket that already has other nodes, the bucket will be locked to avoid data inconsistancies due to modifications by multiple threads. Therefore, other threads trying to add new nodes need to wait until the current thread release the lock.

Finally, I should also mention that the Java HashMap implementation also shifts 16 bits of the hash code of key to the right and does bitwise XOR with the original hash code before doing the bitwise AND with (length - 1) to ensure that the effect of the higher order bits are also captured.

So, basically the point is, if the size is a power of two, the keys will be more evenly distributed across the array with minimal collision leading to better retrieval performance (and also less synchronizations in case of ConcurrentHashMap) when compared with any other size which is not a power of 2.

like image 112
Saptarshi Basu Avatar answered Oct 19 '22 07:10

Saptarshi Basu


No matter what, you need to do a remainder operation to get a hash code (which can be any int) to map to an entry in the hash table.

In the case where m is a power of two -- and only that case -- a % m is equal to a & (m - 1). There is no other case in which remainders can be computed with &.

like image 5
Louis Wasserman Avatar answered Oct 19 '22 07:10

Louis Wasserman