I have this piece of code doing Range Minimum Query. When t = 100000, i and j are always changed in each input line, its execution time in Java 8u60 is about 12 secs.
for (int a0 = 0; a0 < t; a0++) {
String line = reader.readLine();
String[] ls = line.split(" ");
int i = Integer.parseInt(ls[0]);
int j = Integer.parseInt(ls[1]);
int min = width[i];
for (int k = i + 1; k <= j; k++) {
if (min > width[k]) {
min = width[k];
}
}
writer.write(min + "");
writer.newLine();
}
When I extract a new method to find minimum Value, the execution time is 4 times faster (about 2.5 secs).
for (int a0 = 0; a0 < t; a0++) {
String line = reader.readLine();
String[] ls = line.split(" ");
int i = Integer.parseInt(ls[0]);
int j = Integer.parseInt(ls[1]);
int min = getMin(i, j);
writer.write(min + "");
writer.newLine();
}
private int getMin(int i, int j) {
int min = width[i];
for (int k = i + 1; k <= j; k++) {
if (min > width[k]) {
min = width[k];
}
}
return min;
}
I always thought that method calls are slow. But this example shows the opposite. Java 6 also demonstrates this, but the execution times is much slower in both case (17 secs and 10 secs). Can someone provide some insight into this?
TL;DR JIT compiler has more opportunities to optimize the inner loop in the second case, because on-stack replacement happens at the different point.
I've managed to reproduce the problem with the reduced test case.
No I/O or string operations involved, just two nested loops with array access.
public class NestedLoop {
private static final int ARRAY_SIZE = 5000;
private static final int ITERATIONS = 1000000;
private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray();
public long inline() {
long sum = 0;
for (int i = 0; i < ITERATIONS; i++) {
int min = width[0];
for (int k = 1; k < ARRAY_SIZE; k++) {
if (min > width[k]) {
min = width[k];
}
}
sum += min;
}
return sum;
}
public long methodCall() {
long sum = 0;
for (int i = 0; i < ITERATIONS; i++) {
int min = getMin();
sum += min;
}
return sum;
}
private int getMin() {
int min = width[0];
for (int k = 1; k < ARRAY_SIZE; k++) {
if (min > width[k]) {
min = width[k];
}
}
return min;
}
public static void main(String[] args) {
long startTime = System.nanoTime();
long sum = new NestedLoop().inline(); // or .methodCall();
long endTime = System.nanoTime();
long ms = (endTime - startTime) / 1000000;
System.out.println("sum = " + sum + ", time = " + ms + " ms");
}
}
inline
variant indeed works 3-4 times slower than methodCall
.
I've used the following JVM options to confirm that both benchmarks are compiled on the highest tier and OSR (on-stack replacement) successfully occurs in both cases.
-XX:-TieredCompilation
-XX:CompileOnly=NestedLoop
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintCompilation
-XX:+TraceNMethodInstalls
251 46 % NestedLoop::inline @ 21 (70 bytes)
Installing osr method (4) NestedLoop.inline()J @ 21
271 46 NestedLoop::getMin (41 bytes)
Installing method (4) NestedLoop.getMin()I
274 47 % NestedLoop::getMin @ 9 (41 bytes)
Installing osr method (4) NestedLoop.getMin()I @ 9
314 48 % NestedLoop::methodCall @ 4 (30 bytes)
Installing osr method (4) NestedLoop.methodCall()J @ 4
This means JIT does its job, but the generated code must be different.
Let's analyze it with -XX:+PrintAssembly
.
0x0000000002df4dd0: inc %ebp ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114}
;*goto
; - NestedLoop::inline@53 (line 12)
0x0000000002df4dd2: test %eax,-0x1d64dd8(%rip) # 0x0000000001090000
;*iload
; - NestedLoop::inline@21 (line 12)
; {poll}
0x0000000002df4dd8: cmp $0x1388,%ebp
0x0000000002df4dde: jge 0x0000000002df4dfd ;*if_icmpge
; - NestedLoop::inline@26 (line 12)
0x0000000002df4de0: test %rbx,%rbx
0x0000000002df4de3: je 0x0000000002df4e4c
0x0000000002df4de5: mov (%r11),%r10d ;*getfield width
; - NestedLoop::inline@32 (line 13)
0x0000000002df4de8: mov 0xc(%r10),%r9d ; implicit exception
0x0000000002df4dec: cmp %r9d,%ebp
0x0000000002df4def: jae 0x0000000002df4e59
0x0000000002df4df1: mov 0x10(%r10,%rbp,4),%r8d ;*iaload
; - NestedLoop::inline@37 (line 13)
0x0000000002df4df6: cmp %r8d,%r13d
0x0000000002df4df9: jg 0x0000000002df4dc6 ;*if_icmple
; - NestedLoop::inline@38 (line 13)
0x0000000002df4dfb: jmp 0x0000000002df4dd0
0x0000000002da2af0: add $0x8,%edx ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2af3: cmp $0x1381,%edx
0x0000000002da2af9: jge 0x0000000002da2b70 ;*iload_1
; - NestedLoop::getMin@16 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2afb: mov 0x10(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b00: cmp %r11d,%ecx
0x0000000002da2b03: jg 0x0000000002da2b6b ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b05: mov 0x14(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b0a: cmp %r11d,%ecx
0x0000000002da2b0d: jg 0x0000000002da2b5c ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b0f: mov 0x18(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b14: cmp %r11d,%ecx
0x0000000002da2b17: jg 0x0000000002da2b4d ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b19: mov 0x1c(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b1e: cmp %r11d,%ecx
0x0000000002da2b21: jg 0x0000000002da2b66 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b23: mov 0x20(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b28: cmp %r11d,%ecx
0x0000000002da2b2b: jg 0x0000000002da2b61 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b2d: mov 0x24(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b32: cmp %r11d,%ecx
0x0000000002da2b35: jg 0x0000000002da2b52 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b37: mov 0x28(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b3c: cmp %r11d,%ecx
0x0000000002da2b3f: jg 0x0000000002da2b57 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b41: mov 0x2c(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b46: cmp %r11d,%ecx
0x0000000002da2b49: jg 0x0000000002da2ae6 ;*if_icmple
; - NestedLoop::getMin@23 (line 37)
; - NestedLoop::methodCall@11 (line 27)
0x0000000002da2b4b: jmp 0x0000000002da2af0
The compiled code is completely different; methodCall
is optimized much better.
width
field is cached in the register.In contrast, inline
variant
width
array from memory every time;OSR-compiled methods are not always optimized very well, because they have to maintain the state of an interpreted stack frame at the transition point. Here is another example of the same problem.
On-stack replacement usually occurs on backward branches (i.e. at the bottom of the loop). inline
method has two nested loops, and OSR happens inside the inner loop, while methodCall
has just one outer loop. OSR transition in the outer loop in more favourable, because JIT compiler has more freedom to optimize the inner loop. And this is what exactly happens in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With