Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java HashMap put() implementation. Why not check references first?

java.util.HashMap has an implementation of the put method, which has the following code inside it :

if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
}

In the above code why wasn't the reference check made first (since two objects having the same reference will have the same hash and equals()) ?

i.e. something like this :

if ((k = e.key) == key) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
} else if ( compare hash and equals) {
    // do something again with the value
}

Wouldn't this have saved a comparision?

like image 318
Ace McCloud Avatar asked Jun 16 '14 11:06

Ace McCloud


2 Answers

I don't know why, but a naïve microbenchmark suggests that on Oracle's VM (Intel, 32 or 64 bit), comparing two references takes about four times as long as comparing two ints (as in hash codes). I would have assumed that comparing two 32-bit integers and two address pointers should have had similar runtime cost on modern hardware, but I am probably just not considering something obvious here.

Assuming that different keys in most cases have different hash codes, comparing the hash before the key saves 75% runtime for each incorrect key and adds 25% runtime for the correct key. If this actually saves overall runtime depends of course on the exact content and layout of the hash map tables, but the Sun engineers obviously thought that this variant is better for most purposes.

Methods used for benchmarking:

public static int c1(int a, int b, int iter) {
    int r = 0;
    while((iter--)>0) {
        if(a == b) {
            r++;
        }
    }
    return r;
}

public static int c2(Object a, Object b, int iter) {
    int r = 0;
    while((iter--)>0) {
        if(a == b) {
            r++;
        }
    }
    return r;
}
like image 131
jarnbjo Avatar answered Oct 20 '22 17:10

jarnbjo


The operations if_icmpne (compare of integer) and if_acmpne (compare of reference) use the same technique to obtain result [1,2,3,4].

Both have prepared values on stack and consumes form it equally. There should not be significant difference in operations required. Both will be done in single CPU cycle.

In order to state that map can store the object in the same bucket there must be validate two rules.

  • Their hashcode must be equal
  • The must return true when x.equals(y)

IMHO the code reflect those rules and i could be written as

if (e.hash == hash && key.equals(k))

In order to satisfy map requirement we must always validate hashes and equals.

So for performance reason the part key.equals(k) was optimized with (k = e.key) == key giving

((k = e.key) == key || key.equals(k))

This implementation mean that for maps we valuate more hashes and equals as then reference equality. So this is expected behaviour.

like image 32
Damian Leszczyński - Vash Avatar answered Oct 20 '22 17:10

Damian Leszczyński - Vash