Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashMap resize method implementation detail

Tags:

As the title suggests this is a question about an implementation detail from HashMap#resize - that's when the inner array is doubled in size. It's a bit wordy, but I've really tried to prove that I did my best understanding this...

This happens at a point when entries in this particular bucket/bin are stored in a Linked fashion - thus having an exact order and in the context of the question this is important.

Generally the resize could be called from other places as well, but let's look at this case only.

Suppose you put these strings as keys in a HashMap (on the right there's the hashcode after HashMap#hash - that's the internal re-hashing.) Yes, these are carefully generated, not random.

 DFHXR - 11111  YSXFJ - 01111   TUDDY - 11111   AXVUH - 01111   RUTWZ - 11111  DEDUC - 01111  WFCVW - 11111  ZETCU - 01111  GCVUR - 11111  

There's a simple pattern to notice here - the last 4 bits are the same for all of them - which means that when we insert 8 of these keys (there are 9 total), they will end-up in the same bucket; and on the 9-th HashMap#put, the resize will be called.

So if currently there are 8 entries (having one of the keys above) in the HashMap - it means there are 16 buckets in this map and the last 4 bits of they key decided where the entries end-up.

We put the nine-th key. At this point TREEIFY_THRESHOLD is hit and resize is called. The bins are doubled to 32 and one more bit from the keys decides where that entry will go (so, 5 bits now).

Ultimately this piece of code is reached (when resize happens):

 Node<K,V> loHead = null, loTail = null;  Node<K,V> hiHead = null, hiTail = null;  Node<K,V> next;  do {      next = e.next;      if ((e.hash & oldCap) == 0) {           if (loTail == null)                loHead = e;           else                loTail.next = e;           loTail = e;      }      else {         if (hiTail == null)             hiHead = e;         else             hiTail.next = e;         hiTail = e;      }  } while ((e = next) != null);     if (loTail != null) {      loTail.next = null;      newTab[j] = loHead;  }  if (hiTail != null) {      hiTail.next = null;      newTab[j + oldCap] = hiHead;  } 

It's actually not that complicated... what it does it splits the current bin into entries that will move to other bins and to entries that will not move to other bins - but will remain into this one for sure.

And it's actually pretty smart how it does that - it's via this piece of code:

 if ((e.hash & oldCap) == 0)  

What this does is check if the next bit (the 5-th in our case) is actually zero - if it is, it means that this entry will stay where it is; if it's not it will move with a power of two offset in the new bin.

And now finally the question: that piece of code in the resize is carefully made so that it preserves the order of the entries there was in that bin.

So after you put those 9 keys in the HashMap the order is going to be :

DFHXR -> TUDDY -> RUTWZ -> WFCVW -> GCVUR (one bin)  YSXFJ -> AXVUH -> DEDUC -> ZETCU (another bin) 

Why would you want to preserve order of some entries in the HashMap. Order in a Map is really bad as detailed here or here.

like image 708
Eugene Avatar asked Jul 30 '17 20:07

Eugene


People also ask

What happens when HashMap is resized?

In Oracle JDK 8, HashMap resizes when the size is > threshold (capacity * load factor). With capacity of 16 and default load factor of 0.75 , resizing (to capacity of 32 ) takes place when the 13 th entry is put.

What is HashMap and its implementation?

Hashmap uses the array of Nodes(named as table), where Node has fields like the key, value (and much more). Here the Node is represented by class HashMapEntry. Basically, HashMap has an array where the key-value data is stored. It calculates the index in the array where the Node can be placed and it is placed there.

What is the implementation of the HashMap How does it handle collisions?

1) HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location. 2) From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.


1 Answers

The design consideration has been documented within the same source file, in a code comment in line 211

* When bin lists are treeified, split, or untreeified, we keep  * them in the same relative access/traversal order (i.e., field  * Node.next) to better preserve locality, and to slightly  * simplify handling of splits and traversals that invoke  * iterator.remove. When using comparators on insertion, to keep a  * total ordering (or as close as is required here) across  * rebalancings, we compare classes and identityHashCodes as  * tie-breakers.  

Since removing mappings via an iterator can’t trigger a resize, the reasons to retain the order specifically in resize are “to better preserve locality, and to slightly simplify handling of splits”, as well as being consistent regarding the policy.

like image 122
Holger Avatar answered Oct 18 '22 20:10

Holger