Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When and how does HashMap convert the bucket from linked list to Red Black Trees? [duplicate]

I was going through java 8 features and found out that hashmaps use a red black tree instead of a linkedlist when the number of entry sets on the bucket increases.

However, doesn't this require the key to be Comparable or some ordering of the keys to exist and how does this work ? When does this conversion actually happens and how ?

like image 388
Naveen Goyal Avatar asked Dec 21 '17 09:12

Naveen Goyal


People also ask

Does HashMap uses red-black tree?

HashMap implements Hashing, while TreeMap implements Red-Black Tree(a Self Balancing Binary Search Tree). Therefore all differences between Hashing and Balanced Binary Search Tree apply here. Both HashMap and TreeMap have their counterparts HashSet and TreeSet. HashSet and TreeSet implement Set interface.

How buckets are created in HashMap?

HashMap stores elements in so-called buckets and the number of buckets is called capacity. When we put a value in the map, the key's hashCode() method is used to determine the bucket in which the value will be stored. To retrieve the value, HashMap calculates the bucket in the same way – using hashCode().

What is the correct difference between HashMap and TreeMap?

HashMap allows a single null key and multiple null values. TreeMap does not allow null keys but can have multiple null values. HashMap allows heterogeneous elements because it does not perform sorting on keys. TreeMap allows homogeneous values as a key because of sorting.

Does HashMap use tree?

In Java 8, HashMap replaces linked list with a binary tree when the number of elements in a bucket reaches certain threshold.


1 Answers

When there are at least 8 entries (TREEIFY_THRESHOLD) in a single bucket and the total number of buckets is more then 64 (MIN_TREEIFY_CAPACITY) then that single bucket will be transformed to a perfectly balanced red black tree node.

There is also the shrinkage that you should be aware of (if you want) that happens when you remove entries (UNTREEIFY_THRESHOLD == 6).

You are correct that keys should be Comparable - but that is not always required, it is good if they are (in case they have the same hashCode), but in case they are not, this is used:

 static int tieBreakOrder(Object a, Object b) {
        int d;
        if (a == null || b == null ||
            (d = a.getClass().getName().
             compareTo(b.getClass().getName())) == 0)
            d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
                 -1 : 1);
        return d;
 }

So the className as a String is used for comparison and if that fails too, then System.identityHashCode is used (Marsaglia XOR-Shift algorithm) to decide the left and right.

To answer you question when this happens - when resize is called. When you have to resize your HashMap - there are some things happening; like the number of buckets increases by a factor of two (one more bit is taken into consideration where an entry will move or not) or a certain bucket is transformed to a tree. This process (again if you really care) is pretty slow, some people say that a Java HashMap is "sloooooooow, then is fast fast fast; then it's sloooooo, then it's fast fast fast" (I still see this as mocking, but there are PauselessHashMap implementations).

This brings two interesting points. First is choose the correct size of your Map initially (even a rough estimation will do), i.e.:

 new HashMap<>(256); // choosing the size

this will avoid some resizes.

The second one is why transforming to a Tree is important (think database indexes and why they are BTREE...). How many steps it would take you to find an entry in a perfect tree that has INTEGER.MAX_VALUE entries (theoretically). Only up to 32.

like image 119
Eugene Avatar answered Oct 05 '22 08:10

Eugene