I'm trying to write a warm-up routine for a latency sensitive java application in order to optimize the first few transactions that would otherwise be slowed down by dynamic class loading and JIT (mainly).
The problem I'm facing is that even though my warmup code loads all classes and exercises them by calling them many times (at least 100 times -XX:CompileThreshold), later when the actual user logs on these same functions are still marked as "non entrant" and re-compiled again, which causes a latency hit.
The JVM flags are as follows (I only added -XX:+PrintCompilation -verbose:class tp troubleshoot, the others are legacy ):
-Xms5g -Xmx5g -server -XX:+AggressiveHeap -XX:+UseFastAccessorMethods -XX:+PrintGCDetails -XX:CompileThreshold=100 -XX:-CITime -XX:-PrintGC -XX:-PrintGCTimeStamps -XX:+PrintCompilation -verbose:class
#Warmup happens here
12893 2351 my.test.application.hotSpot (355 bytes)
#Real user logs on here
149755 2351 made not entrant my.test.application.hotSpot (355 bytes)
151913 2837 my.test.application.hotSpot (355 bytes)
152079 2351 made zombie my.test.application.hotSpot (355 bytes)
No class loading happens after the warmup (I can see the class loading before though so the flag is working).
It would appear that the function gets a new ID ( 2351 vs 2837 ) which means that somehow it is deemed as "different" by the JVM.
And how can I determine why the JVM decided to recompile this function ?
And I guess that boils down to how can I determine why the ID changed ? What are the criteria ?
I tried marking as many methods and classes as private as I could but to no avail.
This is JRE 1.6.0_45-b06.
Any tips for how to troubleshoot or get more info appreciated ! : )
Answer 2: The JVM keeps the JITCed code for each method in C heap, linked to via tables in the internal representation of the class object. The code is only deleted when the JVM ends or in the rare case that the class is "unloaded". It's all managed by suitable magic.
By default, the number of threads is set to 2 for the server JVM, to 1 for the client JVM, and it scales to the number of cores if tiered compilation is used. Sets the maximum code cache size (in bytes) for JIT-compiled code. This option is equivalent to -Xmaxjitcodesize .
For posterity, it was fairly simple once I read some of the source code of the hotspot JVM.
The following flags would point out the exact source code line that lead to a function being de-optimized and re-compiled:
-XX:+TraceDeoptimization -XX:+WizardMode -XX:+PrintNativeNMethods -XX:+PrintDependencies -XX:+DebugDeoptimization -XX:+LogEvents
Usually it was an if-statement like this.
void function (Object object){
if ( object == null ){
// do some uncommon cleanup or initialization
}
do_stuff();
}
Let's say my warmup code never triggered the if statement.
I had assumed that the whole function would be compiled in one go, however, when the JIT C2 compiler actually does decide to produce native code for this function, it will not generate any code for the if-statement because that code path has never been taken.
It will only generate a conditional branch that generates a trap and exception handler in the C2 compiler thread. I think this happens because the native code cache was/is fairly small and so the JVM writer did not want to fill it with potentially useless code.
Anyway, if the statement is ever true (i.e the object is ever null), then the function will immediately and unconditionally trigger this exception handling and be re-compiled ( leading to a freeze/latency hit in the order of a couple ms ).
Of course my warmup code would not call each function in the exact same way as production and I would venture to guess that in any complex product this is close to impossible and a maintenance nightmare anyway.
What this means is that for effectively warming up a java application, every single if-statement in the code needs to be called by th warmup code.
And so we are going to simply abandon the idea of "warming up" our java code because it is not as simple as some would believe.
For the following reasons, we are going to re-write parts of the application to support being ran for weeks/months at a time:
Long-term the customer will likely pay for a rewrite in C/C++ or the like to get consistently low-latency but that's for another day.
EDIT: Let me just add that updating to a newer version of the hotspot JVM or "tuning" around the hotspot JVM parameters will never resolve this issue. They are both smokes and mirrors. The fact is that the hotspot JVM was never written for predictable low-latency and this shortcoming is impossible to work around from within the java userland.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With