Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 odd timing/memory issue

Tags:

I've run across a rather strange problem that I can create when running Java 8. The problem presents itself as if some sort of timing error is happening within the JVM itself. It is intermittent in nature, but easily reproducible (at least within my test environments). The problem is that an array value being explicitly set is being destroyed and replaced with a 0.0 under certain circumstances. Specifically, in the code below, array[0] is evaluating to 0.0 after the line new Double(r.nextDouble());. Then, if you immediately look at the contents of array[0] again, it now shows the value to be the correct value of 1.0. Sample output from running this test case is:

claims array[0] != 1.0....array[0] = 1.0
claims array[0] now == 1.0...array[0] = 1.0`

I'm running 64-bit Windows 7 and am able to reproduce this problem, from both within Eclipse and when compiling from the command line, with JDKs 1.8_45, 1.8_51, and 1.8_60. I am unable to produce the problem running 1.7_51. The same results have been demonstrated on another 64-bit Windows 7 box.

This problem appeared in a large, non-trivial piece of software, but I've managed to condense it down to a few lines of code. Below is a small test case that demonstrates the issue. It is a rather odd looking test case, but appears to all be necessary to cause the error. The use of Random is not required - I can replace all r.nextDouble() with any double value and demonstrate the issue. Interestingly enough, if someArray[0] = .45; is replaced with someArray[0] = r.nextDouble();, I could not replicate the issue (although there is nothing special about .45). Eclipse debugging is also of no help - it changes the timing enough such that it doesn't happen anymore. Even a well placed System.err.println() statement will cause the issue to no longer appear.

Again, the issue is intermittent, so to reproduce the issue one might have to run this test case several times. I think the most I've had to run it is around 10 times before I get the output shown above. In Eclipse, I give it a second or two after running and then kill it if it hasn't happened. From the command line the same - run it, if it doesn't happen CTRL+C to quit and try again. It appears that if it's going to happen, it happens pretty quickly.

I've come across issues like this in the past, but they were all threading issues. I can't figure out what's going on here - I've even looked at bytecode (which was identical between 1.7_51 and 1.8_45, by the way).

Any ideas on what is happening here?

import java.util.Random;

public class Test { 
    Test(){
        double array[] = new double[1];     
        Random r = new Random();

        while(true){
            double someArray[] = new double[1];         
            double someArray2 [] = new double [2];

            for(int i = 0; i < someArray2.length; i++) {
                someArray2[i] = r.nextDouble();
            }

            // for whatever reason, using r.nextDouble() here doesn't seem
            // to show the problem, but the # you use doesn't seem to matter either...

            someArray[0] = .45;

            array[0] = 1.0;

            // commented out lines also demonstrate problem
            new Double(r.nextDouble());
            // new Float(r.nextDouble();
            // double d = new Double(.1) * new Double(.3);
            // double d = new Double(.1) / new Double(.3);
            // double d = new Double(.1) + new Double(.3);
            // double d = new Double(.1) - new Double(.3);

            if(array[0] != 1.0){
                System.err.println("claims array[0] != 1.0....array[0] = " + array[0]);

                if(array[0] != 1.0){
                    System.err.println("claims array[0] still != 1.0...array[0] = " + array[0]);
                }else {
                    System.err.println("claims array[0] now == 1.0...array[0] = " + array[0]);
                }

                System.exit(0);
            }else if(r.nextBoolean()){
                array = new double[1];
            }
        }
    }

    public static void main(String[] args) {
        new Test();
    }
}
like image 902
bcothren Avatar asked Oct 07 '15 14:10

bcothren


1 Answers

Update: seems that my original answer was incorrect and OnStackReplacement just revealed the problem in this particular case, but the original bug was in the escape analysis code. Escape analysis is a compiler subsystem which determines whether object escapes from the given method or not. Non-escaped objects can be scalarized (instead of on-heap allocation) or totally optimized out. In our test escape analysis does matter as several created objects surely don't escape the method.

I downloaded and installed JDK 9 early access build 83 and noticed that the bug disappears there. However in JDK 9 early access build 82 it still exists. The changelog between b82 and b83 shows only one relevant bug fix (correct me if I'm wrong): JDK-8134031 "Incorrect JIT compilation of complex code with inlining and escape analysis". The committed testcase is somewhat similar: big loop, several boxes (similar to one-element arrays in our test) which lead to the sudden change of the value inside the box, so the result becomes silently incorrect (no crash, no exception, just incorrect value). As in our case it's reported that problem does not appear prior to 8u40. The introduced fix is very short: just a one line change in escape analysis source.

According to OpenJDK bug tracker, the fix is already backported to JDK 8u72 branch, which is scheduled to be released in January, 2016. Seems that it was too late to backport this fix to the upcoming 8u66.

The suggested work-around is to disable escape analysis (-XX:-DoEscapeAnalysis) or to disable eliminate allocations optimization (-XX:-EliminateAllocations). Thus @apangin was actually closer to the answer than me.

Below is the original answer


First, I cannot reproduce the problem with JDK 8u25, but can on JDK 8u40 and 8u60: sometimes it runs correctly (stuck in infinite loop), sometimes it outputs and exits. So if JDK downgrade to 8u25 is acceptable for you, you may consider to do this. Note that if you need later fixes in javac (many things especially involving lambdas were fixed in 1.8u40), you can compile with newer javac, but run on older JVM.

For me it seems that this particular problem is likely a bug in OnStackReplacement mechanism (when OSR occurs at tier 4). If you're unfamiliar with OSR, you may read this answer. The OSR surely occurs in your case, but in a little bit weird way. Here's -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+TraceNMethodInstalls for failed run (% means OSR JIT, @ 28 means OSR bytecode position, (3) and (4) means tier level):

...
     91   37 %     3       Test::<init> @ 28 (194 bytes)
Installing osr method (3) Test.<init>()V @ 28
     93   38       3       Test::<init> (194 bytes)
Installing method (3) Test.<init>()V 
     94   39 %     4       Test::<init> @ 16 (194 bytes)
Installing osr method (4) Test.<init>()V @ 16
    102   40 %     4       Test::<init> @ 28 (194 bytes)
    103   39 %     4       Test::<init> @ -2 (194 bytes)   made not entrant
...
Installing osr method (4) Test.<init>()V @ 28
    113   37 %     3       Test::<init> @ -2 (194 bytes)   made not entrant
claims array[0] != 1.0....array[0] = 1.0
claims array[0] now == 1.0...array[0] = 1.0

Thus the OSR at tier4 occurs for two different bytecode offsets: offset 16 (which is the while loop entry point) and offset 28 (which is the nested for loop entry point). It seems that some race condition occurs during the context transfer between both OSR-compiled versions of your method which results in broken context. When execution is handed over to OSR method, it should transfer current context including the values of local variables like array and r into the OSR'ed method. Something bad happens here: probably for a short time <init>@16 OSR version works, then it's replaced with <init>@28, but context is updated with a little delay. It's likely that OSR context transfer interferes with "eliminate allocations" optimization (as noted by @apangin switching off this optimization helps in your case). My expertise is not enough to dig further here, probably @apangin may comment.

In contrast in normal run only one copy of tier 4 OSR method is created and installed:

...
Installing method (3) Test.<init>()V 
     88   43 %     4       Test::<init> @ 28 (194 bytes)
Installing osr method (4) Test.<init>()V @ 28
    100   40 %     3       Test::<init> @ -2 (194 bytes)   made not entrant
   4592   44       3       java.lang.StringBuilder::append (8 bytes)
...

So seems that in this case no race between two OSR versions occur and everything works perfectly.

The problem also disappears if you move the outer loop body to the separate method:

import java.util.Random;

public class Test2 {
    private static void doTest(double[] array, Random r) {
        double someArray[] = new double[1];
        double someArray2[] = new double[2];

        for (int i = 0; i < someArray2.length; i++) {
            someArray2[i] = r.nextDouble();
        }

        ... // rest of your code
    }

    Test2() {
        double array[] = new double[1];
        Random r = new Random();

        while (true) {
            doTest(array, r);
        }
    }

    public static void main(String[] args) {
        new Test2();
    }
}

Also manual unrolling the nested for loop removes the bug:

int i=0;
someArray2[i++] = r.nextDouble();
someArray2[i++] = r.nextDouble();

To hit this bug it seems that you should have at least two nested loops in the same method, so OSR can occur at different bytecode positions. So to work-around problem in your particular piece of code you can just do the same: extract the loop body into the separate method.

An alternative solution is to disable the OSR completely with -XX:-UseOnStackReplacement. It's rarely helps in production code. Loop counters still work and if your method with many-iterations-loop is called at least twice, the second run will be JIT-compiled anyways. Also even if your method with long loop is not JIT-compiled due to disabled OSR, any methods it calls will still be JIT-compiled.

like image 99
Tagir Valeev Avatar answered Sep 30 '22 06:09

Tagir Valeev