Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java String hashcode caching mechanism

Tags:

java

hashcode

Looking at Java's String class we can see that hash code is cached after first evaluation.

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Where hash is an instance variable. I have a question, why do we need that h extra variable?

like image 779
user3673623 Avatar asked Apr 27 '17 09:04

user3673623


People also ask

Does Java cache hashCode?

The cache is the "private volatile int hashCode". When the hash is calculated, it's saved to the cache. Initially the value is 0 as are all non-local numerical variables.

Are string hashes cached?

It is cached in a private int field in the string itself. It doesn't make any difference that different Strings may have the same hashcode ... because the hashcode is stored in the respective String objects.

Is Java string hashCode stable?

String hashcode is well defined and same on any Java platform. @zhong.j.yu you're assuming JRockit and IBM JVM have the same implementation for String#hashCode . @zhong.j.yu, according to the source code of the String class, it looks stable enough.

Why does Java use 31 in the hashCode () for string?

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.


3 Answers

Simply because hash value changes in the loop and your solution without intermediate temporary variable is not thread-safe. Consider that this method is invoked in several threads.

Say thread-1 started hash computation and it is not 0 anymore. Some small moment later thread-2 invokes the same method hashCode() on the same object and sees that hash is not 0, but thread-1 hasn't yet finished its computation. As the result, in the thread-2 wrong hash (not fully computed) value will be used.

like image 103
Andremoniy Avatar answered Oct 07 '22 21:10

Andremoniy


It's a simple and cheap synchronization mechanism.

If a thread invokes hashCode() for the first time and a second thread invokes it again while the first thread is calculating the hash, the second thread would return an incorrect hash (an intermediate value of the calculation in the first thread) if using directly the attribute.

like image 45
Mario Avatar answered Oct 07 '22 21:10

Mario


To put it very simple: local primitive h is well local; thus thread-safe; as opposed to hash which is shared.

like image 28
Eugene Avatar answered Oct 07 '22 21:10

Eugene