As you may know, Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE
and to prevent a negative value, the safeAbs
method was implemented in my project:
public static int safeAbs(int i) {
i = Math.abs(i);
return i < 0 ? 0 : i;
}
I compared the performance with the following one:
public static int safeAbs(int i) {
return i == Integer.MIN_VALUE ? 0 : Math.abs(i);
}
And the first one is almost 6x times slower than the second (the second one performance is almost the same as "pure" Math.abs(int)). From my point of view, there is no significant difference in bytecode, but I guess the difference is present in the JIT "assembly" code:
"slow" version:
0x00007f0149119720: mov %eax,0xfffffffffffec000(%rsp)
0x00007f0149119727: push %rbp
0x00007f0149119728: sub $0x20,%rsp
0x00007f014911972c: test %esi,%esi
0x00007f014911972e: jl 0x7f0149119734
0x00007f0149119730: mov %esi,%eax
0x00007f0149119732: jmp 0x7f014911973c
0x00007f0149119734: neg %esi
0x00007f0149119736: test %esi,%esi
0x00007f0149119738: jl 0x7f0149119748
0x00007f014911973a: mov %esi,%eax
0x00007f014911973c: add $0x20,%rsp
0x00007f0149119740: pop %rbp
0x00007f0149119741: test %eax,0x1772e8b9(%rip) ; {poll_return}
0x00007f0149119747: retq
0x00007f0149119748: mov %esi,(%rsp)
0x00007f014911974b: mov $0xffffff65,%esi
0x00007f0149119750: nop
0x00007f0149119753: callq 0x7f01490051a0 ; OopMap{off=56}
;*ifge
; - math.FastAbs::safeAbsSlow@6 (line 16)
; {runtime_call}
0x00007f0149119758: callq 0x7f015f521d20 ; {runtime_call}
"normal" version:
# {method} {0x00007f31acf28cd8} 'safeAbsFast' '(I)I' in 'math/FastAbs'
# parm0: rsi = int
# [sp+0x30] (sp of caller)
0x00007f31b08c7360: mov %eax,0xfffffffffffec000(%rsp)
0x00007f31b08c7367: push %rbp
0x00007f31b08c7368: sub $0x20,%rsp
0x00007f31b08c736c: cmp $0x80000000,%esi
0x00007f31b08c7372: je 0x7f31b08c738e
0x00007f31b08c7374: mov %esi,%r10d
0x00007f31b08c7377: neg %r10d
0x00007f31b08c737a: test %esi,%esi
0x00007f31b08c737c: mov %esi,%eax
0x00007f31b08c737e: cmovl %r10d,%eax
0x00007f31b08c7382: add $0x20,%rsp
0x00007f31b08c7386: pop %rbp
0x00007f31b08c7387: test %eax,0x162c2c73(%rip) ; {poll_return}
0x00007f31b08c738d: retq
0x00007f31b08c738e: mov %esi,(%rsp)
0x00007f31b08c7391: mov $0xffffff65,%esi
0x00007f31b08c7396: nop
0x00007f31b08c7397: callq 0x7f31b07b11a0 ; OopMap{off=60}
;*if_icmpne
; - math.FastAbs::safeAbsFast@3 (line 17)
; {runtime_call}
0x00007f31b08c739c: callq 0x7f31c5863d20 ; {runtime_call}
Benchmark code:
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = {"-Xms3g", "-Xmx3g", "-server"})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
public class SafeAbsMicroBench {
@State(Scope.Benchmark)
public static class Data {
final int len = 10_000_000;
final int[] values = new int[len];
@Setup(Level.Trial)
public void setup() {
// preparing 10 million random integers without MIN_VALUE
for (int i = 0; i < len; i++) {
int val;
do {
val = ThreadLocalRandom.current().nextInt();
} while (val == Integer.MIN_VALUE);
values[i] = val;
}
}
}
@Benchmark
public int safeAbsSlow(Data data) {
int sum = 0;
for (int i = 0; i < data.len; i++)
sum += safeAbsSlow(data.values[i]);
return sum;
}
@Benchmark
public int safeAbsFast(Data data) {
int sum = 0;
for (int i = 0; i < data.len; i++)
sum += safeAbsFast(data.values[i]);
return sum;
}
private int safeAbsSlow(int i) {
i = Math.abs(i);
return i < 0 ? 0 : i;
}
private int safeAbsFast(int i) {
return i == Integer.MIN_VALUE ? 0 : Math.abs(i);
}
public static void main(String[] args) throws RunnerException {
final Options options = new OptionsBuilder()
.include(SafeAbsMicroBench.class.getSimpleName())
.build();
new Runner(options).run();
}
}
Results (Linux x86-64, 7820HQ, checked on oracle jdk 8 and 11 with pretty similar results).
Benchmark Mode Cnt Score Error Units
SafeAbsMicroBench.safeAbsFast avgt 10 6435155.516 ± 47130.767 ns/op
SafeAbsMicroBench.safeAbsSlow avgt 10 35646411.744 ± 776173.621 ns/op
Can someone explain why the first code is significantly slower than the second one?
Math. pow is slow because it deals with an equation in the generic sense, using fractional powers to raise it to the given power. It's the lookup it has to go through when computing that takes more time. Simply multiplying numbers together is often faster, since native calls in Java are much more efficient.
Large Integers Long variables can hold numbers from -9,223,372,036,854,775,808 through 9,223,372,036,854,775,807. Operations with Long are slightly slower than with Integer .
'int' runs much faster. Conclusion: Use "int" data type instead of "long" as much as you can to get better execution performance in interpreted-only mode.
There is a difference in the generated native code for the safeAbsSlow
and safeAbsFast
methods.
safeAbsSlow
(C2, level 4):
0x0000023d12ec4b14: add eax,ecx
0x0000023d12ec4b16: inc ebx
0x0000023d12ec4b18: cmp ebx,989680h
0x0000023d12ec4b1e: jnl 23d12ec4b4eh ; jump if `ebx` was not less than `10_000_000`
0x0000023d12ec4b20: mov ecx,dword ptr [r9+rbx*4+10h]
0x0000023d12ec4b25: test ecx,ecx
0x0000023d12ec4b27: jnl 23d12ec4b14h ; jump if `ecx` was not less-than `0`
0x0000023d12ec4b29: neg ecx
0x0000023d12ec4b2b: test ecx,ecx
0x0000023d12ec4b2d: jnl 23d12ec4b14h ; jump if `ecx` was not less-than `0`
safeAbsFast
(C2, level 4):
0x000001d89e8a4b20: mov ecx,dword ptr [r9+rdi*4+10h]
0x000001d89e8a4b25: cmp ecx,80000000h
0x000001d89e8a4b2b: je 1d89e8a4b66h ; jump if `ecx` was equal to `2147483648`
0x000001d89e8a4b2d: mov r11d,ecx
0x000001d89e8a4b30: neg r11d
0x000001d89e8a4b33: test ecx,ecx
0x000001d89e8a4b35: cmovl ecx,r11d
0x000001d89e8a4b39: add eax,ecx
0x000001d89e8a4b3b: inc edi
0x000001d89e8a4b3d: cmp edi,989680h
0x000001d89e8a4b43: jl 1d89e8a4b20h ; jump if `edi` was less than `10_000_000`
As we can see from the above, safeAbsSlow
has more conditional jumps than safeAbsFast
.
This is particularly because the Math.abs
implementation which is inlined into the safeAbsFast
has no conditional jumps:
0x000001d89e8a4b2d: mov r11d,ecx
0x000001d89e8a4b30: neg r11d
0x000001d89e8a4b33: test ecx,ecx
0x000001d89e8a4b35: cmovl ecx,r11d
As a result, there are many more branch-misses in the slow
version in comparison to the normal
version when the data set has both positive and negative values that are scattered across an array. Below is the corresponding statistic that was collected using the perf
Linux profiler:
Benchmark Mode Cnt Score Error Units
safeAbsFast avgt 10 9611659.726 ± 1429082.431 ns/op
safeAbsFast:branch-misses avgt 2869.853 #/op
safeAbsFast:branches avgt 12492918.020 #/op
safeAbsFast:cycles avgt 28212203.936 #/op
safeAbsFast:instructions avgt 92352048.153 #/op
safeAbsSlow avgt 10 44524180.366 ± 6324887.086 ns/op
safeAbsSlow:branch-misses avgt 5006493.144 #/op
safeAbsSlow:branches avgt 17496069.911 #/op
safeAbsSlow:cycles avgt 126413171.674 #/op
safeAbsSlow:instructions avgt 67549877.558 #/op
In contrast, here is the result for the sorted data set:
Benchmark Mode Cnt Score Error Units
safeAbsFast avgt 10 9026800.584 ± 528992.157 ns/op
safeAbsFast:branch-misses avgt 2785.463 #/op
safeAbsFast:branches avgt 12474751.905 #/op
safeAbsFast:cycles avgt 27379727.603 #/op
safeAbsFast:instructions avgt 92418075.715 #/op
safeAbsSlow avgt 10 6981828.374 ± 2375480.834 ns/op
safeAbsSlow:branch-misses avgt 2801.022 #/op
safeAbsSlow:branches avgt 17496585.992 #/op
safeAbsSlow:cycles avgt 19478382.113 #/op
safeAbsSlow:instructions avgt 67589946.278 #/op
The previously slow
version becomes even faster when the data set is sorted (costly branch-misses are minimized in this case).
Environment:
openjdk version "12-internal" 2019-03-19
OpenJDK Runtime Environment (slowdebug build 12-internal+0-adhoc.jdk12)
OpenJDK 64-Bit Server VM (slowdebug build 12-internal+0-adhoc.jdk12, mixed mode)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With