Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JIT recompiles to do fast Throw after more iterations if stacktrace is of even length

Tags:

java

jvm

jit

The following code,

public class TestFastThrow {

    public static void main(String[] args) {
        int count = 0;
        int exceptionStackTraceSize = 0;
        Exception exception = null;
        do {
            try {
                throwsNPE(1);
            }
            catch (Exception e) {
                exception = e;
                if (exception.getStackTrace().length != 0) {
                    exceptionStackTraceSize = exception.getStackTrace().length;
                    count++;
                }
            }
        }
        while (exception.getStackTrace().length != 0);
        System.out.println("Iterations to fastThrow :" + count + ", StackTraceSize :" + exceptionStackTraceSize);
    }

    static void throwsNPE(int callStackLength) {
        throwsNPE(callStackLength, 0);
    }

    static void throwsNPE(int callStackLength, int count) {
        if (count == callStackLength) {
            ((Object) null).getClass();
        }
        else {
            throwsNPE(callStackLength, count + 1);
        }
    }

}

gives the following output after running multiple times,

Iterations to fastThrow :5517, StackTraceSize :4
Iterations to fastThrow :2825, StackTraceSize :5
Iterations to fastThrow :471033, StackTraceSize :6
Iterations to fastThrow :1731, StackTraceSize :7
Iterations to fastThrow :157094, StackTraceSize :10
.
.
.
Iterations to fastThrow :64587, StackTraceSize :20
Iterations to fastThrow :578, StackTraceSize :29

VM details

Java HotSpot(TM) 64-Bit Server VM (11.0.5+10-LTS) for bsd-amd64 JRE (11.0.5+10-LTS)
-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly

What was surprising is why does the JIT take much more iterations to optimize if the stack trace is of even length?

I enabled JIT Logs and analysed via jitwatch, but couldn't see anything helpful, just that the timeline of when the C1 and C2 compilation seems to be happening later for the even size stacktraces.

Time line is something like this, (looking at when java.lang.Throwable.getStackTrace() is compiled)

| StackSize     | 10    | 11    |
|---------------|-------|-------|
| Queued for C1 | 1.099 | 1.012 |
| C1            | 1.318 | 1.162 |
| Queued for C2 | 1.446 | 1.192 |
| C2            | 1.495 | 1.325 |

Why exactly is this happening? And what heuristics does the JIT use for fast throw?

like image 370
Adwait Kumar Avatar asked Mar 25 '20 14:03

Adwait Kumar


1 Answers

This effect is a result of tricky tiered compilation and inlining policy.

Let me explain on the simplified example:

public class TestFastThrow {

    public static void main(String[] args) {
        for (int iteration = 0; ; iteration++) {
            try {
                throwsNPE(2);
            } catch (Exception e) {
                if (e.getStackTrace().length == 0) {
                    System.out.println("Iterations to fastThrow: " + iteration);
                    break;
                }
            }
        }
    }

    static void throwsNPE(int depth) {
        if (depth <= 1) {
            ((Object) null).getClass();
        }
        throwsNPE(depth - 1);
    }
}

For simplicity, I'll exclude all methods from compilation, except throwsNPE.

-XX:CompileCommand=compileonly,TestFastThrow::throwsNPE -XX:+PrintCompilation
  1. HotSpot uses Tiered Compilation by default. Here throwsNPE is first compiled at tier 3 (C1 with profiling). Profiling in C1 makes possible to recompile the method later by C2.

  2. OmitStackTraceInFastThrow optimization works only in C2 compiled code. So, the sooner the code is compiled by C2 - the less iterations will pass before the loop finishes.

  3. How profiling in C1-compiled code works: the counter is incremented on every method invocation and on every backward branch (however, there are no backward branches in throwsNPE method). When the counter reaches certain configurable threshold, JVM compilation policy decides whether current method needs to be recompiled.

  4. throwsNPE is a recursive method. HotSpot can inline recursive calls up to -XX:MaxRecursiveInlineLevel (default value is 1).

  5. The frequency how often C1 compiled code calls back to JVM compilation policy, differs for regular invocations vs. inlined invocations. A regular method notifies JVM every 210 invocations (-XX:Tier3InvokeNotifyFreqLog=10), while the inlined method notifies JVM much more rarely: every 220 invocations (-XX:Tier23InlineeNotifyFreqLog=20).

  6. For the even number of recursive calls, all invocations follow Tier23InlineeNotifyFreqLog parameter. When the number of calls is odd, inlining does not work for the last leftover call, and this last invocation follows Tier3InvokeNotifyFreqLog parameter.

  7. This means, when the call depth is even, throwsNPE will be recompiled only after 220 calls, i.e. after 219 loop iterations. That's exactly what you'll see when you run the above code with throwNPE(2):

    Iterations to fastThrow: 524536
    

    524536 is very close to 219 = 524288

    Now, if you run the same application with -XX:Tier23InlineeNotifyFreqLog=15, the number of iterations will be close to 214 = 16384.

    Iterations to fastThrow: 16612
    
  8. Now let's change the code to call throwsNPE(1). The program will finish very quickly, regardless of Tier23InlineeNotifyFreqLog value. That's because the different option rules now. But if I rerun the program with -XX:Tier3InvokeNotifyFreqLog=20, the loop will finish not earlier than after 220 iterations:

    Iterations to fastThrow: 1048994
    

Summary

Fast throw optimization applies only to C2-compiled code. Due to one level of inlining (-XX:MaxRecursiveInlineLevel), C2 compilation is triggered earlier (after 2Tier3InvokeNotifyFreqLog invocations, if the number of recursive calls is odd), or later (after 2Tier23InlineeNotifyFreqLog invocations, if all recursive calls are covered by inlining).

like image 173
apangin Avatar answered Nov 19 '22 19:11

apangin