The code below calls two simple functions 10 billion times each.
public class PerfTest {
private static long l = 0;
public static void main(String[] args) {
List<String> list = Arrays.asList("a", "b");
long time1 = System.currentTimeMillis();
for (long i = 0; i < 1E10; i++) {
func1("a", "b");
}
long time2 = System.currentTimeMillis();
for (long i = 0; i < 1E10; i++) {
func2(list);
}
System.out.println((time2 - time1) + "/" + (System.currentTimeMillis() - time2));
}
private static void func1(String s1, String s2) { l++; }
private static void func2(List<String> sl) { l++; }
}
My assumption was that the performance of these two calls would be close to identical. If anything I would have guessed that passing two arguments would be slightly slower than passing one. Given all arguments are object references I wasn't expecting the fact that one was a list to make any difference.
I have run the test many times and a typical result is "12781/30536". In other words, the call using two strings takes 13 secs and the call using a list takes 30 secs.
What is the explanation for this difference in performance? Or is this an unfair test? I have tried switching the two calls (in case it was due to startup effects) but the results are the same.
Update
This is not a fair test for many reasons. However it does demonstrate real behaviour of the Java compiler. Note the following two additions to demonstrate this:
s1.getClass()
and sl.getClass()
to the functions makes the two function calls perfom the same-XX:-TieredCompilation
also makes the two functions calls perform the sameThe explanation for this behaviour is in the accepted answer below. The very brief summary of @apangin's answer is that func2
is not inlined by the hotspot compiler because the class of its argument (i.e. List
) is not resolved. Forcing resolution of the class (e.g. using getClass
) causes it to be inlined which significantly improves its performance. As pointed out in the answer, unresolved classes are unlikely to occur in real code which makes this code a unrealistic edge case.
The benchmark is unfair, however, it has revealed an interesting effect.
As Sotirios Delimanolis has noticed, the performance difference is caused by the fact that func1
is inlined by HotSpot compiler, while func2
is not. The reason is func2
argument of type List
, the class that has never been resolved during execution of the benchmark.
Note that List
class is not actually used: no List methods called, no fields of type List declared, no class casts and no other actions performed that typically cause class resolution. If you add usage of List
class anywhere in the code, func2
will be inlined.
The other cirumstance that affected compilation strategy is the simplicity of the method. It is so simple that JVM has decided to compile it in Tier 1 (C1 with no further optimization). If it were compiled with C2, List
class would be resolved. Try running with -XX:-TieredCompilation
, and you'll see that func2
is successfully inlined, and performs as fast as func1
.
Writing realistic microbenchmarks manually is a really difficult job. There are so many aspects that may lead to confusing results, e.g. inlining, dead code elimination, on-stack replacement, profile pollution, recompilation etc. That's why it is highly recommended to use proper benchmarking tools like JMH. A hand-written benchmarks can easily fool JVM. Particularly, real applications are very unlikely to have methods with classes that are never used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With