Possible Duplicate:
Java HashMap Default Initial Capacity
I was reading the implementation of HashMap in java.util.HashMap. The initial capacity,maximum capacity etc., are powers of two.
Parts of declaration copied from java.util.HashMap
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 16;
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The table, resized as necessary. Length MUST Always be a power of two.
*/
transient Entry[] table;
The comments suggest the sizes MUST be a power of two. Why is power of two given so much importance ?
So, basically the point is, if the size is a power of two, the keys will be more evenly distributed across the array with minimal collision leading to better retrieval performance (and also less synchronizations in case of ConcurrentHashMap ) when compared with any other size which is not a power of 2.
The load factor is the measure that decides when to increase the capacity of the Map. The default load factor is 75% of the capacity. The threshold of a HashMap is approximately the product of current capacity and load factor. Rehashing is the process of re-calculating the hash code of already stored entries.
The Initial Capacity is essentially the number of buckets in the HashMap which by default is 24 = 16. A good HashMap algorithm will distribute an equal number of elements to all the buckets. Say we have 16 elements then each bucket will have 1 node, the search for any element will be achieved with 1 lookup.
HashMap in Java works on hashing principles. It is a data structure that allows us to store object and retrieve it in constant time O(1) provided we know the key. In hashing, hash functions are used to link keys and values in HashMap.
Using powers of two simplifies the implementation and improves its performance.
E.g. to find a bucket from a hash code it can use hash & (SIZE -1)
instead of abs(hash) % SIZE
Theoretically speaking, we can amortize the cost of expanding the list only if the operation becomes negligible as the number of elements in the map increases. Doubling the size every time the load factor is reached is one way of ensuring the expansion and reloading of entries is amortized.
The reason why it is specifically a power of two initially is so that when we hash an element, the resulting integer (32-bits) can be truncated to the first k-bits, where k is the log (N) where N is the current capacity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With