I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized
keyword, Broke Synchronization
should never be printed from the class TestBuffer.
Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.
Sync Issue - Controller.java
Sync Issue - SyncTest.java
Sync Issue - TestBuffer.java
Sync Issue - Tuple3f.java
And here is the output I get when I run it:
java -cp . SyncTest Before Adding Creating a TestBuffer Before Remove Broke Synchronization 1365192 Broke Synchronization 1365193 Broke Synchronization 1365194 Broke Synchronization 1365195 Broke Synchronization 1365196 Done
UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition
Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.
ORIGINAL QUESTION:
I have a class which provides the following methods:
I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized
. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer
should always be set back to false before remove
can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException
.
Very strange race condition which looks like a JRE issue
This was confirmed as a bug in Java by Sun. It is apparently fixed (unknowingly?) in jdk7u4, but they have not backported the fix to jdk6. Bug ID: 7176993
A synchronized block in Java is synchronized on some object. All synchronized blocks synchronize on the same object can only have one thread executing inside them at a time. All other threads attempting to enter the synchronized block are blocked until the thread inside the synchronized block exits the block.
Java synchronized volatile [About] => synchronized. synchronized block in Java is a monitor in multithreading. synchronized block with the same object/class can be executed by only single thread, all others are waiting. It can help with race condition situation when several threads try to update the same variable.
You can have both static synchronized method and nonstatic synchronized method and synchronized blocks in Java but we can not have synchronized variable in java. Using synchronized keyword with a variable is illegal and will result in compilation error.
I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.
Running as:
java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest
you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:
68 1 java.lang.String::hashCode (64 bytes) 97 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes) 104 3 java.math.BigInteger::mulAdd (81 bytes) 106 4 java.math.BigInteger::multiplyToLen (219 bytes) 111 5 java.math.BigInteger::addOne (77 bytes) 113 6 java.math.BigInteger::squareToLen (172 bytes) 114 7 java.math.BigInteger::primitiveLeftShift (79 bytes) 116 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes) 121 8 java.math.BigInteger::montReduce (99 bytes) 126 9 sun.security.provider.SHA::implCompress (491 bytes) 138 10 java.lang.String::charAt (33 bytes) 139 11 java.util.ArrayList::ensureCapacity (58 bytes) 139 12 java.util.ArrayList::add (29 bytes) 139 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes) 158 13 java.util.HashMap::indexFor (6 bytes) 159 14 java.util.HashMap::hash (23 bytes) 159 15 java.util.HashMap::get (79 bytes) 159 16 java.lang.Integer::valueOf (32 bytes) 168 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes) 168 18 s phil.StrangeRaceConditionTest::remove (10 bytes) 171 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes) 172 3% phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes) ERRORS //my little change 219 15 made not entrant java.util.HashMap::get (79 bytes)
There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:
exclude phil/StrangeRaceConditionTest strangeRaceConditionTest
When you run the program again, you get this output:
CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest 73 1 java.lang.String::hashCode (64 bytes) 104 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes) 110 3 java.math.BigInteger::mulAdd (81 bytes) 113 4 java.math.BigInteger::multiplyToLen (219 bytes) 118 5 java.math.BigInteger::addOne (77 bytes) 120 6 java.math.BigInteger::squareToLen (172 bytes) 121 7 java.math.BigInteger::primitiveLeftShift (79 bytes) 123 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes) 128 8 java.math.BigInteger::montReduce (99 bytes) 133 9 sun.security.provider.SHA::implCompress (491 bytes) 145 10 java.lang.String::charAt (33 bytes) 145 11 java.util.ArrayList::ensureCapacity (58 bytes) 146 12 java.util.ArrayList::add (29 bytes) 146 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes) 165 13 java.util.HashMap::indexFor (6 bytes) 165 14 java.util.HashMap::hash (23 bytes) 165 15 java.util.HashMap::get (79 bytes) 166 16 java.lang.Integer::valueOf (32 bytes) 174 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes) 174 18 s phil.StrangeRaceConditionTest::remove (10 bytes) ### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest 177 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes) 324 15 made not entrant java.util.HashMap::get (79 bytes)
and the problem does not appear (at least not in the repeated attempts that I've made).
Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.
java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest
Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.
See here for info on the JVM options.
See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With