The following code,
public class TestFastThrow {
public static void main(String[] args) {
int count = 0;
int exceptionStackTraceSize = 0;
Exception exception = null;
do {
try {
throwsNPE(1);
}
catch (Exception e) {
exception = e;
if (exception.getStackTrace().length != 0) {
exceptionStackTraceSize = exception.getStackTrace().length;
count++;
}
}
}
while (exception.getStackTrace().length != 0);
System.out.println("Iterations to fastThrow :" + count + ", StackTraceSize :" + exceptionStackTraceSize);
}
static void throwsNPE(int callStackLength) {
throwsNPE(callStackLength, 0);
}
static void throwsNPE(int callStackLength, int count) {
if (count == callStackLength) {
((Object) null).getClass();
}
else {
throwsNPE(callStackLength, count + 1);
}
}
}
gives the following output after running multiple times,
Iterations to fastThrow :5517, StackTraceSize :4
Iterations to fastThrow :2825, StackTraceSize :5
Iterations to fastThrow :471033, StackTraceSize :6
Iterations to fastThrow :1731, StackTraceSize :7
Iterations to fastThrow :157094, StackTraceSize :10
.
.
.
Iterations to fastThrow :64587, StackTraceSize :20
Iterations to fastThrow :578, StackTraceSize :29
VM details
Java HotSpot(TM) 64-Bit Server VM (11.0.5+10-LTS) for bsd-amd64 JRE (11.0.5+10-LTS)
-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly
What was surprising is why does the JIT take much more iterations to optimize if the stack trace is of even length?
I enabled JIT Logs and analysed via jitwatch, but couldn't see anything helpful, just that the timeline of when the C1 and C2 compilation seems to be happening later for the even size stacktraces.
Time line is something like this, (looking at when java.lang.Throwable.getStackTrace()
is compiled)
| StackSize | 10 | 11 |
|---------------|-------|-------|
| Queued for C1 | 1.099 | 1.012 |
| C1 | 1.318 | 1.162 |
| Queued for C2 | 1.446 | 1.192 |
| C2 | 1.495 | 1.325 |
Why exactly is this happening? And what heuristics does the JIT use for fast throw?
This effect is a result of tricky tiered compilation and inlining policy.
Let me explain on the simplified example:
public class TestFastThrow {
public static void main(String[] args) {
for (int iteration = 0; ; iteration++) {
try {
throwsNPE(2);
} catch (Exception e) {
if (e.getStackTrace().length == 0) {
System.out.println("Iterations to fastThrow: " + iteration);
break;
}
}
}
}
static void throwsNPE(int depth) {
if (depth <= 1) {
((Object) null).getClass();
}
throwsNPE(depth - 1);
}
}
For simplicity, I'll exclude all methods from compilation, except throwsNPE
.
-XX:CompileCommand=compileonly,TestFastThrow::throwsNPE -XX:+PrintCompilation
HotSpot uses Tiered Compilation by default. Here throwsNPE
is first compiled at tier 3 (C1 with profiling). Profiling in C1 makes possible to recompile the method later by C2.
OmitStackTraceInFastThrow
optimization works only in C2 compiled code. So, the sooner the code is compiled by C2 - the less iterations will pass before the loop finishes.
How profiling in C1-compiled code works: the counter is incremented on every method invocation and on every backward branch (however, there are no backward branches in throwsNPE
method). When the counter reaches certain configurable threshold, JVM compilation policy decides whether current method needs to be recompiled.
throwsNPE
is a recursive method. HotSpot can inline recursive calls up to -XX:MaxRecursiveInlineLevel
(default value is 1).
The frequency how often C1 compiled code calls back to JVM compilation policy, differs for regular invocations vs. inlined invocations. A regular method notifies JVM every 210 invocations (-XX:Tier3InvokeNotifyFreqLog=10
), while the inlined method notifies JVM much more rarely: every 220 invocations (-XX:Tier23InlineeNotifyFreqLog=20
).
For the even number of recursive calls, all invocations follow Tier23InlineeNotifyFreqLog
parameter. When the number of calls is odd, inlining does not work for the last leftover call, and this last invocation follows Tier3InvokeNotifyFreqLog
parameter.
This means, when the call depth is even, throwsNPE
will be recompiled only after 220 calls, i.e. after 219 loop iterations. That's exactly what you'll see when you run the above code with throwNPE(2)
:
Iterations to fastThrow: 524536
524536 is very close to 219 = 524288
Now, if you run the same application with -XX:Tier23InlineeNotifyFreqLog=15
, the number of iterations will be close to 214 = 16384.
Iterations to fastThrow: 16612
Now let's change the code to call throwsNPE(1)
. The program will finish very quickly, regardless of Tier23InlineeNotifyFreqLog
value. That's because the different option rules now. But if I rerun the program with -XX:Tier3InvokeNotifyFreqLog=20
, the loop will finish not earlier than after 220 iterations:
Iterations to fastThrow: 1048994
Summary
Fast throw optimization applies only to C2-compiled code. Due to one level of inlining (-XX:MaxRecursiveInlineLevel
), C2 compilation is triggered earlier (after 2Tier3InvokeNotifyFreqLog invocations, if the number of recursive calls is odd), or later (after 2Tier23InlineeNotifyFreqLog invocations, if all recursive calls are covered by inlining).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With