Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why HashMap resize when it hits TREEIFY_THRESHOLD value which is not required?

I know How HashMap works internally. But while checking HashMap code with TreeNode implementation I'm not getting the goal behind the increasing size of bucket but not treeify until bucket size hits MIN_TREEIFY_CAPACITY = 64.

Note: I've considered Map m = new HashMap(); so default size would be 16.

Default values.

static final int TREEIFY_THRESHOLD = 8;
static final int MIN_TREEIFY_CAPACITY = 64;

HashMap#putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)

I extracted few lines from putVal method.

else {
    for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
            p.next = newNode(hash, key, value, null);
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                treeifyBin(tab, hash);
            break;
        }
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
    }
}

So whenever binCount hits 7 it'll call treeifyBin(tab, hash); Now lets follow the code in method treeifyBin.

HashMap#treeifyBin(Node[] tab, int hash)

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        ....
    }
}

Why? Here in this method in first IF it checks that current tab size is less than MIN_TREEIFY_CAPACITY = 64 then call resize(). Which internally increase the tab size from default 16 to 32 and transfer all element to new tab. And again 32 to 64. Which I think is overhead or unnecessarily.

So whats the goal behind this? Checking size with TREEIFY_THRESHOLD in putVal but not doing treeify until it hits MIN_TREEIFY_CAPACITY.

like image 891
Vicky Thakor Avatar asked Dec 23 '22 20:12

Vicky Thakor


1 Answers

Both, using a tree or a capacity larger than usual, are measures to deal with collisions. When there are multiple keys mapped to the same bucket, it can be one of the following scenarios (or a combination of them):

  1. The keys have different hash codes but got mapped to the same bucket
  2. The keys have the same hash code but implement Comparable
  3. The keys have the same hash code and do not implement Comparable

Neither approach can deal with point three. Only building a tree can deal with the second. When we have the first scenario, expanding the table may solve the issue and if it does, it has the advantage of still providing an O(1) lookup and allowing more efficient traversal (just iterating over the array) whereas the tree has an O(log n) lookup and less efficient traversal, requiring to descend the tree structure.

The problem is, analyzing the scenario, to find out which solution is applicable and whether expanding the table would actually help, would take time on its own. Further, it wouldn’t pay off when a single put takes the expense of the analysis to dismiss a strategy, just to end up with the next put finding the strategy suitable for another key (after all, expanding the table size has an impact on the entire table).

So a heuristic is used to accommodate likelihoods and typical use cases of HashMap, incorporating not just a single put operation. Note that for small table sizes, the chances of solving a bucket collision via expansion are higher, a table size of 16 implies using only four bits of the hash code whereas a table size of 32 implies using five bits, 25% more.

I suppose, the JDK team used the usual approach of benchmarking real life applications and libraries to find the right trade-off.

like image 136
Holger Avatar answered Mar 15 '23 23:03

Holger