could someone point me in direction which might lead me to get why JIT deoptimizes my loop? (OSR). It looks like it gets compiled once by C1 and then its deoptimized multiple times (I can see tens or maybe hundred logs which starts with <deoptimized...>)
This is the class which contains that important loop:
@SynchronizationRequired
public class Worker implements Runnable
{
private static final byte NOT_RUNNING = 0, RUNNING = 1, SHUTDOWN = 2, FORCE_SHUTDOWN = 3;
private static final AtomicIntegerFieldUpdater<Worker> isRunningFieldUpdater =
AtomicIntegerFieldUpdater.newUpdater(Worker.class, "isRunning");
private volatile int isRunning = NOT_RUNNING;
private final Queue<FunkovConnection> tasks = new SpscUnboundedArrayQueue<>(512);
/**
* Executing tasks from queue until closed.
*/
@Override
public void run()
{
if (isRunning())
{
return;
}
while (notClosed())
{
FunkovConnection connection = tasks.poll();
if (null != connection)
{
connection.run();
}
}
if (forceShutdown())
{
setNonRunning();
return;
}
FunkovConnection connection;
while ((connection = tasks.poll()) != null)
{
connection.run();
}
setNonRunning();
}
public void submit(FunkovConnection connection)
{
tasks.add(connection);
}
/**
* Shutdowns worker after it finish processing all pending tasks on its queue
*/
public void shutdown()
{
isRunningFieldUpdater.compareAndSet(this, RUNNING, SHUTDOWN);
}
/**
* Shutdowns worker after it finish currently processing task. Pending tasks on queue are not handled
*/
public void shutdownForce()
{
isRunningFieldUpdater.compareAndSet(this, RUNNING, FORCE_SHUTDOWN);
}
private void setNonRunning()
{
isRunningFieldUpdater.set(this, NOT_RUNNING);
}
private boolean forceShutdown()
{
return isRunningFieldUpdater.get(this) == FORCE_SHUTDOWN;
}
private boolean isRunning()
{
return isRunningFieldUpdater.getAndSet(this, RUNNING) == RUNNING;
}
public boolean notClosed()
{
return isRunningFieldUpdater.get(this) == RUNNING;
}
}
JIT logs:
1. <task_queued compile_id='535' compile_kind='osr' method='Worker run ()V' bytes='81' count='1' backedge_count='60416' iicount='1' osr_bci='8' level='3' stamp='0,145' comment='tiered' hot_count='60416'/>
2. <nmethod compile_id='535' compile_kind='osr' compiler='c1' level='3' entry='0x00007fabf5514ee0' size='5592' address='0x00007fabf5514c10' relocation_offset='344' insts_offset='720' stub_offset='4432' scopes_data_offset='4704' scopes_pcs_offset='5040' dependencies_offset='5552' nul_chk_table_offset='5560' oops_offset='4624' metadata_offset='4640' method='Worker run ()V' bytes='81' count='1' backedge_count='65742' iicount='1' stamp='0,146'/>
3. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='68801' iicount='1'/>
</deoptimized>
4. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='76993' iicount='1'/>
</deoptimized>
5.<deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='85185' iicount='1'/>
</deoptimized>
6. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='93377' iicount='1'/>
</deoptimized>
Two questions here:
The JIT compiler helps improve the performance of Java programs by compiling bytecodes into native machine code at run time. The JIT compiler is enabled by default. When a method has been compiled, the JVM calls the compiled code of that method directly instead of interpreting it.
JIT was invented to improve the performance of JVM after many years of its initial release. JVM consists of many other components like stack area, heap area, etc. JIT is one of the components of JVM. JVM compiles complete byte code to machine code.
The Just-In-Time (JIT) compiler is a component of the Javaâ„¢ Runtime Environment that improves the performance of Java applications at run time. Java programs consists of classes, which contain platform-neutral bytecodes that can be interpreted by a JVM on many different computer architectures.
JIT improves organizational efficiency in five major ways: The Just in Time method entails sourcing the required raw material or item for processing on demand, and scheduling the work based on order or demand for the product.
When a thread deoptimizes, it stops executing a compiled method at some point in the method and resumes execution in the same Java method at the exact same point, but in the interpreter. Why would a thread stop executing compiled code to switch to much slower interpreted code?
Companies find the JIT method advantageous because it helps them cut down on waste and maintain positive cash flow. A potential problem with the JIT system arises if there is a sudden, unexpected increase in demand for a company’s products.
There are three major elements of JIT that supports this philosophy: This is an inventory control strategy that sets purchase of raw materials from suppliers based on the production schedule. Hence, accurate forecast of demand is necessary.
I'm glad that @aran's suggestion helped in your case, however, it's just a lucky coincidence. After all, JIT inlining options affect many things, including the compilation order, timings, and so on. In fact, deoptimization has nothing to do with inlining.
I was able to reproduce your problem, and here is my analysis.
We see in HotSpot sources that <deoptimized>
message is printed by Deoptimization::deoptimize_single_frame
function. Let's engage async-profiler to find where this function is called from. To do so, add the following JVM option:
-agentlib:asyncProfiler=start,event=Deoptimization::deoptimize_single_frame,file=deopt.html
Here is the relevant part of the output:
So, the deoptimization reason is Runtime1::counter_overflow
function. A method compiled by C1 at tier 3, counts invocations and backward branches (loop iterations). Every 2Tier3BackedgeNotifyFreqLog iterations a method calls Runtime1::counter_overflow
to decide whether it should be recompiled at a higher tier.
In your logs we see that backedge_count
increments exactly by 8192 (213), and the bytecode at index 37 is goto
corresponding to while (notClosed())
loop.
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='76993' iicount='1'/>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='85185' iicount='1'/>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='93377' iicount='1'/>
When the counter overflows (every 8192 iterations), the JVM checks if an OSR compiled method for the given bytecode index is ready (it might not be ready yet, since JIT compilation runs in background). But if the JVM finds such method, it performs OSR transition by deoptimizing the current frame and replacing it with the corresponding OSR method.
It turns out that in your example the JVM finds an existing OSR method compiled at tier 3. Basically, it deoptimizes Worker.run
frame compiled at tier 3, and replaces it by exactly the same method! This repeats again and again, until C2 finishes its background job. Then Worker.run
is replaced by tier 4 compilation, and everything becomes fine.
Of course, this should not normally happen. This is actually a JVM bug JDK-8253118. It has been fixed in JDK 16 and will be likely backported to JDK 11u. I have verified that excessive deoptimization does not happen with JDK 16 Early-Access builds.
This was a lucky shot, as @apangin comments. If you want to know what really happens, do not waste time reading this answer.
-- pic of me while answering
Although the JIT compiler may inline aggressively in some cases, it still has its own time constraints, and won't inline if that's going to take significant time to do. As a result, a method is eligible for inlining only if its bytecode size is less than 35 bytes (by default).
In your case, your methods are 81 bytes of size, hence not eligible:
<jvms bci='37' method='Worker run ()V' bytes='81' ...
Java Performance: The Definitive Guide by Scott Oaks
The basic decision about whether to inline a method depends on how hot it is and its size. The JVM determines if a method is hot (i.e., called frequently) based on an internal calculation; it is not directly subject to any tunable parameters. If a method is eligible for inlining because it is called frequently, then it will be inlined only if its bytecode size is less than 325 bytes (or whatever is specified as the -XX:MaxFreqInlineSize=N flag). Otherwise, it is eligible for inlining only if it is small: less than 35 bytes (or whatever is specified as the -XX:MaxInlineSize=N flag).
In order to make your methods be inlined, you could change the inline size limit by command line, specifying -XX:MaxInlineSize=N
.
As a test, you could try specifying something like
-XX:MaxInlineSize=90
in order to check if those methods are now inlined.
To the reader - the suggestion above "fixed" the issue(not really fixed). How? No idea. I was even given the correct answer tick lol
(I'll just let this here as seems cool)
--
Workload characterization of JVM languages
Aibek S. Lukas S. LubomÃr B. Andreas S. Andrej P. Yudi Z. Walter B.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With