Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Java's "String" hashcode function thread-safe if its cache setter does not use locks?

Here is the code from Java's String hashCode function

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

As you can see, it checks if the hash ("private int hash") == 0 and otherwise sets it. The constructor does not always set this value (and why else the check of course).

So although it would be quite hard to reproduce in practical usages, it looks like one could have a race condition on this hash right?

I mean, once you put it in a hashmap, for example, it would be safe, unless you first sent it off to another thread. But if the string was on two threads and simultaneously added to a hashmap, the hashMap function could take the partially written "hash" value and return it.

like image 912
user1122069 Avatar asked Jan 17 '17 18:01

user1122069


2 Answers

Theoretically one can generate code that would cause multiple simultaneous threads to read the 0 valued hash and go into the calculation part. That would be "wasteful", but safe, since the function operates on the immutable characters, and each instance would calculate the exact same hash value.

like image 157
Amit Avatar answered Oct 06 '22 01:10

Amit


Reading and writing to hashCode is not properly synchronized according to the Java Memory Model but it is safe nevertheless.

If multiple threads write to hashCode then, due to the immutability of a String object, it is implicit that the calculation yields the same result. Assume that this result is x then any thread is guaranteed to observe either 0 or x because int is atomic on all VMs. In case that a thread observes 0, it simply recalculates the hash code which is guaranteed to yield x, thus only resetting the value if another thread applied the operation concurrently or within its thread-local cache.

In this sense, the outcome is deterministic. At the same time, it is not required to synchronized threads for sharing this instance. Assume that you would have some key "foo" throughout your application used by all of your threads. Due to Java's string deduplication, this string constant would be shared among all of your threads which would have to synchronize only to save them the trouble to recompute the hash codes. Computing the hash code is however a very cheap operation whereas synchronization is very expensive. As the correctness is given, this optimization makes sense.

like image 37
Rafael Winterhalter Avatar answered Oct 06 '22 01:10

Rafael Winterhalter