Here is the code from Java's String hashCode function
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
As you can see, it checks if the hash ("private int hash") == 0 and otherwise sets it. The constructor does not always set this value (and why else the check of course).
So although it would be quite hard to reproduce in practical usages, it looks like one could have a race condition on this hash right?
I mean, once you put it in a hashmap, for example, it would be safe, unless you first sent it off to another thread. But if the string was on two threads and simultaneously added to a hashmap, the hashMap function could take the partially written "hash" value and return it.
Theoretically one can generate code that would cause multiple simultaneous threads to read the 0 valued hash and go into the calculation part. That would be "wasteful", but safe, since the function operates on the immutable characters, and each instance would calculate the exact same hash value.
Reading and writing to hashCode
is not properly synchronized according to the Java Memory Model but it is safe nevertheless.
If multiple threads write to hashCode
then, due to the immutability of a String
object, it is implicit that the calculation yields the same result. Assume that this result is x
then any thread is guaranteed to observe either 0
or x
because int
is atomic on all VMs. In case that a thread observes 0
, it simply recalculates the hash code which is guaranteed to yield x
, thus only resetting the value if another thread applied the operation concurrently or within its thread-local cache.
In this sense, the outcome is deterministic. At the same time, it is not required to synchronized threads for sharing this instance. Assume that you would have some key "foo"
throughout your application used by all of your threads. Due to Java's string deduplication, this string constant would be shared among all of your threads which would have to synchronize only to save them the trouble to recompute the hash codes. Computing the hash code is however a very cheap operation whereas synchronization is very expensive. As the correctness is given, this optimization makes sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With