In a recent discussion about how to optimize some code, I was told that breaking code up into lots of small methods can significantly increase performance, because the JIT compiler doesn't like to optimize large methods.
I wasn't sure about this since it seems that the JIT compiler should itself be able to identify self-contained segments of code, irrespective of whether they are in their own method or not.
Can anyone confirm or refute this claim?
Which of the following statements are correct about JIT? JIT compiler compiles instructions into machine code at run time. The code compiler by the JIT compiler runs under CLR. The instructions compiled by JIT compilers are written in native code.
The JIT compiler helps improve the performance of Java programs by compiling bytecodes into native machine code at run time. The JIT compiler is enabled by default. When a method has been compiled, the JVM calls the compiled code of that method directly instead of interpreting it.
To help the JIT compiler analyze the method, its bytecodes are first reformulated in an internal representation called trees, which resembles machine code more closely than bytecodes. Analysis and optimizations are then performed on the trees of the method. At the end, the trees are translated into native code.
The Hotspot JIT only inlines methods that are less than a certain (configurable) size. So using smaller methods allows more inlining, which is good.
See the various inlining options on this page.
EDIT
To elaborate a little:
Example (full code to have the same line numbers if you try it)
package javaapplication27; public class TestInline { private int count = 0; public static void main(String[] args) throws Exception { TestInline t = new TestInline(); int sum = 0; for (int i = 0; i < 1000000; i++) { sum += t.m(); } System.out.println(sum); } public int m() { int i = count; if (i % 10 == 0) { i += 1; } else if (i % 10 == 1) { i += 2; } else if (i % 10 == 2) { i += 3; } i += count; i *= count; i++; return i; } }
When running this code with the following JVM flags: -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:FreqInlineSize=50 -XX:MaxInlineSize=50 -XX:+PrintInlining
(yes I have used values that prove my case: m
is too big but both the refactored m
and m2
are below the threshold - with other values you might get a different output).
You will see that m()
and main()
get compiled, but m()
does not get inlined:
56 1 javaapplication27.TestInline::m (62 bytes) 57 1 % javaapplication27.TestInline::main @ 12 (53 bytes) @ 20 javaapplication27.TestInline::m (62 bytes) too big
You can also inspect the generated assembly to confirm that m
is not inlined (I used these JVM flags: -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel
) - it will look like this:
0x0000000002780624: int3 ;*invokevirtual m ; - javaapplication27.TestInline::main@20 (line 10)
If you refactor the code like this (I have extracted the if/else in a separate method):
public int m() { int i = count; i = m2(i); i += count; i *= count; i++; return i; } public int m2(int i) { if (i % 10 == 0) { i += 1; } else if (i % 10 == 1) { i += 2; } else if (i % 10 == 2) { i += 3; } return i; }
You will see the following compilation actions:
60 1 javaapplication27.TestInline::m (30 bytes) 60 2 javaapplication27.TestInline::m2 (40 bytes) @ 7 javaapplication27.TestInline::m2 (40 bytes) inline (hot) 63 1 % javaapplication27.TestInline::main @ 12 (53 bytes) @ 20 javaapplication27.TestInline::m (30 bytes) inline (hot) @ 7 javaapplication27.TestInline::m2 (40 bytes) inline (hot)
So m2
gets inlined into m
, which you would expect so we are back to the original scenario. But when main
gets compiled, it actually inlines the whole thing. At the assembly level, it means you won't find any invokevirtual
instructions any more. You will find lines like this:
0x00000000026d0121: add ecx,edi ;*iinc ; - javaapplication27.TestInline::m2@7 (line 33) ; - javaapplication27.TestInline::m@7 (line 24) ; - javaapplication27.TestInline::main@20 (line 10)
where basically common instructions are "mutualised".
Conclusion
I am not saying that this example is representative but it seems to prove a few points:
And finally: if a portion of your code is really critical for performance that these considerations matter, you should examine the JIT output to fine tune your code and importantly profile before and after.
If you take the exact same code and just break them up into lots of small methods, that is not going to help JIT at all.
A better way to put it is that modern HotSpot JVMs do not penalize you for writing a lot of small methods. They do get aggressively inlined, so at runtime you do not really pay the cost of function calls. This is true even for invokevirtual calls, such as the one that calls an interface method.
I did a blog post several years ago that describes how you can see JVM is inlining methods. The technique is still applicable to modern JVMs. I also found it useful to look at the discussions related to invokedynamic, where how the modern HotSpot JVMs compiles Java byte code gets discussed extensively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With