I was going through java 8 features and found out that hashmaps use a red black tree instead of a linkedlist when the number of entry sets on the bucket increases.
However, doesn't this require the key to be Comparable or some ordering of the keys to exist and how does this work ? When does this conversion actually happens and how ?
HashMap implements Hashing, while TreeMap implements Red-Black Tree(a Self Balancing Binary Search Tree). Therefore all differences between Hashing and Balanced Binary Search Tree apply here. Both HashMap and TreeMap have their counterparts HashSet and TreeSet. HashSet and TreeSet implement Set interface.
HashMap stores elements in so-called buckets and the number of buckets is called capacity. When we put a value in the map, the key's hashCode() method is used to determine the bucket in which the value will be stored. To retrieve the value, HashMap calculates the bucket in the same way – using hashCode().
HashMap allows a single null key and multiple null values. TreeMap does not allow null keys but can have multiple null values. HashMap allows heterogeneous elements because it does not perform sorting on keys. TreeMap allows homogeneous values as a key because of sorting.
In Java 8, HashMap replaces linked list with a binary tree when the number of elements in a bucket reaches certain threshold.
When there are at least 8 entries (TREEIFY_THRESHOLD
) in a single bucket and the total number of buckets is more then 64 (MIN_TREEIFY_CAPACITY
) then that single bucket will be transformed to a perfectly balanced red black tree node.
There is also the shrinkage that you should be aware of (if you want) that happens when you remove entries (UNTREEIFY_THRESHOLD
== 6).
You are correct that keys should be Comparable
- but that is not always required, it is good if they are (in case they have the same hashCode
), but in case they are not, this is used:
static int tieBreakOrder(Object a, Object b) {
int d;
if (a == null || b == null ||
(d = a.getClass().getName().
compareTo(b.getClass().getName())) == 0)
d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
-1 : 1);
return d;
}
So the className as a String
is used for comparison and if that fails too, then System.identityHashCode
is used (Marsaglia XOR-Shift algorithm) to decide the left and right.
To answer you question when this happens - when resize is called. When you have to resize your HashMap
- there are some things happening; like the number of buckets increases by a factor of two (one more bit is taken into consideration where an entry will move or not) or a certain bucket is transformed to a tree. This process (again if you really care) is pretty slow, some people say that a Java HashMap is "sloooooooow, then is fast fast fast; then it's sloooooo, then it's fast fast fast" (I still see this as mocking, but there are PauselessHashMap
implementations).
This brings two interesting points. First is choose the correct size of your Map
initially (even a rough estimation will do), i.e.:
new HashMap<>(256); // choosing the size
this will avoid some resizes.
The second one is why transforming to a Tree
is important (think database indexes and why they are BTREE
...). How many steps it would take you to find an entry in a perfect tree that has INTEGER.MAX_VALUE entries (theoretically). Only up to 32.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With