How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code? I read that keys should implement Comparable
to define an ordering. How does HashMap combine hashing and natural ordering to implement the trees? What about classes that do not implement Comparable
, or when multiple, non-mutually-comparable Comparable
implementations are keys in the same map?
If two keys are the same ( equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
1) HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location. 2) From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.
In Java 8, HashMap replaces linked list with a binary tree when the number of elements in a bucket reaches certain threshold. While converting the list to binary tree, hashcode is used as a branching variable.
Collisions in the HashMap A collision, or more specifically, a hash code collision in a HashMap, is a situation where two or more key objects produce the same final hash value and hence point to the same bucket location or array index.
The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]
Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)
When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder
is invoked to break the tie, first by string comparison on getClass().getName()
(!), then by comparing System.identityHashCode
.
The actual tree building starts in treeifyBin
, beginning when a bin reaches TREEIFY_THRESHOLD
(currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY
capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With