Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code?

How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code? I read that keys should implement Comparable to define an ordering. How does HashMap combine hashing and natural ordering to implement the trees? What about classes that do not implement Comparable, or when multiple, non-mutually-comparable Comparable implementations are keys in the same map?

like image 474
Thennam Avatar asked May 11 '15 09:05

Thennam


People also ask

What happens if two keys have the same hashCode in HashMap?

If two keys are the same ( equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).

How does Java deal with hash collision in HashMap?

1) HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location. 2) From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.

What change happened in HashMap in Java 8?

In Java 8, HashMap replaces linked list with a binary tree when the number of elements in a bucket reaches certain threshold. While converting the list to binary tree, hashcode is used as a branching variable.

How hash collision occurs in HashMap?

Collisions in the HashMap A collision, or more specifically, a hash code collision in a HashMap, is a situation where two or more key objects produce the same final hash value and hence point to the same bucket location or array index.


1 Answers

The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder is invoked to break the tie, first by string comparison on getClass().getName() (!), then by comparing System.identityHashCode.

The actual tree building starts in treeifyBin, beginning when a bin reaches TREEIFY_THRESHOLD (currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode).

like image 119
Jeffrey Bosboom Avatar answered Sep 23 '22 12:09

Jeffrey Bosboom