I was reading this blogpost.
And the author was talking about breaking the hashCode()
in String
in multithread environment.
By having:
public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return h; }
Changed to:
public int hashCode() { if (hash == 0) { int off = offset; char val[] = value; int len = count; int h = 0; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return hash; }
Which the author says and I quote:
"What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!"
So further going through comments, someone says it can be reordered to
int h = hash; if (hash == 0) { ... } return h;
How is that possible? I thought reordering only involves moving program statements up and down. What rules is it following? I've Googled, read the JSR133 FAQ, checked the Java Concurrency in Practice book, but I can't seem to find a place that helps me to understand particularly on reordering. If anyone can point me to the right direction, I would really appreciate it.
Thanks to Louis clarifying the meaning of "Reordering", I wasn't thinking in terms of "byteCode"
However, I still don't understand why is it allowed to move the 2nd Read to the front, this is my naive attempt to translate it to somewhat "bytecode" format.
For simplification purpose, operations that are used to calculate the hashcode are express as calchash()
. Therefore, I express the program as:
if (hash == 0) { h = calchash(); hash = h; } return hash;
And my attempt to express it in "bytecode" form:
R1,R2,R3 are in the operands stack, or the registers h is in the array of local variables
In program order:
if (hash == 0) { ---------- R1 = read hash from memory (1st read) ---------- Compare (R1 == 0) h = calchash(); ---------- R2 = calchash() ---------- h = R2 (Storing the R2 to local variable h) hash = h; ---------- Hash = h (write to hash) } return hash ---------- R3 = read hash from memory again(2nd read) ---------- return R3
Reordered transformation (My version based on comments):
---------- R3 = read hash from memory (2nd read) *moved* if (hash == 0) { ---------- R1 = read hash from memory (1st read) ---------- Compare (R1 == 0) h = calchash(); ---------- R2 = calchash() ---------- h = R2 (Storing the R2 to local variable h) hash = h; ---------- hash = h (write to hash) } return hash ---------- return R3
Checking the comments again, I found this answered by the author:
Reordered Transformation (From the blog)
r1 = hash; if (hash == 0) { r1 = hash = // calculate hash } return r1;
This case actually works on single thread, but it's possible to fail with multiple threads.
It seems that the JVM are making simplifications based on
h = hash and it simplifies the use of R1, R2, R3 to single R1
Therefore, JVM does more than reordering instructions, it also seems reducing the amount of registers being used.
Happens-before is not any keyword or object in the Java language, it is simply a discipline put into place so that in a multi-threading environment the reordering of the surrounding instructions does not result in a code that produces incorrect output.
The Happens-before relationship provides some sort of ordering and visibility guarantee. There is a lot of rule concerning Happens-before. Still, the most important one is if there is a synchronization like a synchronized block or a volatile variable then. - “A volatile write will happen before another volatile read.”
In your modified code:
public int hashCode() { if (hash == 0) { // (1) int off = offset; char val[] = value; int len = count; int h = 0; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return hash; // (2) }
(1) and (2) could be reordered: (1) could read a non null value while (2) would read 0. That can't happen in the actual implementation in the String class because the calculation is made on the local variable and the return value is also that local variable, which, by definition, is thread safe.
The issue is that the Java Memory Model provides no guarantee when a shared variable (hash
) is accessed without proper synchronization - in particular it does not guarantee that all executions will be sequentially consistent. Had hash
been volatile, there would be no problem with the modified code.
ps: the author of that blog, I believe, is one of the writers of the Chapter 17 (Java Memory Model) of the JLS - so I would tend to believe him anyway ;-)
UPDATE
Following the various edits / comments - let's look at the bytecode in more details with these two methods (I assume that the hashcode is always 1 to keep things simple):
public int hashcode_shared() { if (hash == 0) { hash = 1; } return hash; } public int hashcode_local() { int h = hash; if (h == 0) { hash = h = 1; } return h; }
The java compiler on my machine generates the following bytecode:
public int hashcode_shared(); 0: aload_0 //read this 1: getfield #6 //read hash (r1) 4: ifne 12 //compare r1 with 0 7: aload_0 //read this 8: iconst_1 //constant 1 9: putfield #6 //put 1 into hash (w1) 12: aload_0 //read this 13: getfield #6 //read hash (r2) 16: ireturn //return r2 public int hashcode_local(); 0: aload_0 //read this 1: getfield #6 //read hash (r1) 4: istore_1 //store r1 in local variable h 5: iload_1 //read h 6: ifne 16 //compare h with 0 9: aload_0 //read this 10: iconst_1 //constant 1 11: dup //constant again 12: istore_1 //store 1 into h 13: putfield #6 //store 1 into hash (w1) 16: iload_1 //read h 17: ireturn //return h
In the first example, there are 2 reads of the shared variable hash
: r1 and r2. As discussed above, because there is no synchronization and the variable is shared, the Java Memory Model applies and a compiler/JVM is allowed to reorder the two reads: line #13 could be inserted before line #1*.
In the second example, all the operations on h
, the local variable, need to be sequentially consistent because because of intra-thread semantics and program order guarantee on non-shared variables.
Note: as always, the fact that the reordering is allowed does not mean it will be performed. It is actually unlikely to happen on current x86/hotspot combinations. But it could happen on other current or future architectures/JVM.
*That is a bit of a shortcut, what could happen in practice is that the compiler might rewrite hashcode_shared
like this:
public int hashcode_shared() { int h = hash; if (hash != 0) return h; return (hash = 1); }
The code is strictly equivalent in a single threaded environment (it will always return the same value as the original method) so the reordering is allowed. But in a multi-threaded environment, it is clear that if hash
is changed from 0 to 1 by another thread between the first two lines, this reordered method will incorrectly return 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With