I've run into a case where I thought the JIT should have an easy time optimising, but it does not seem to.
I've reduced the problem to a minimal example:
Consider a class IntArrayWrapper
:
class IntArrayWrapper {
private int[] data = new int[100000];
public void setInteger(int i, Integer x) { data[i] = x; }
public void setInt (int i, int x) { data[i] = x; }
}
The only difference between the two methods is that x
is an Integer
(boxed) or an int
(primitive).
I've written some JMH benchmarks to measure the performance of these two methods:
@Benchmark
public void bmarkSetIntConst() {
final IntArrayWrapper w = new IntArrayWrapper();
for (int i = 0; i < 100000; i++) {
w.setInt(i, 100);
}
}
@Benchmark
public void bmarkSetIntStair() {
final IntArrayWrapper w = new IntArrayWrapper();
for (int i = 0; i < 100000; i++) {
w.setInt(i, i);
}
}
// omitted: bmarkSetIntegerConst and bmarkSetIntStair that use .setInteger(..)
What I expected to see was:
setIntegerConst
equal to setIntConst
. THIS IS TRUE.
setIntegerStair
equal to setIntStair
. THIS IS NOT TRUE.
The reason I thought was that I think that the JIT should inline the setInteger
call, and realise that there is an auto-boxing operation (from the call) directly followed by an unboxing operation (from the array assignment) and therefore be able to remove the boxing/unboxing.
This does not seem to be the case.
setIntStair
and setIntConst
have different performance, I have a feeling that the JIT might produce SIMD code here, but I'd be grateful for any insights.These are the results, the whole code is here: https://gist.github.com/kaeluka/fe1210074038424c30db7a52ac5c2d7b
Benchmark Mode Cnt Score Error Units
MyBenchmark.bmarkSetIntConst thrpt 20 15717.814 ± 362.137 ops/s
MyBenchmark.bmarkSetIntegerConst thrpt 20 15814.296 ± 657.945 ops/s
MyBenchmark.bmarkSetIntStair thrpt 20 11941.879 ± 200.335 ops/s
MyBenchmark.bmarkSetIntegerStair thrpt 20 2981.398 ± 48.806 ops/s
MyBenchmark.bmarkSetIntSawtooth thrpt 20 11072.882 ± 234.686 ops/s
MyBenchmark.bmarkSetIntegerSawtooth thrpt 20 11105.272 ± 156.496 ops/s
setInteger
to take an int
? (My original code uses generics, so int
is not an option unless I want to duplicate a lot of code).Added results for bmarkSetIntegerSawtooth
and bmarkSetIntSawtooth
that set the value to i % 128
to measure the impact of object pooling for Integer
s.
Why is the JIT not able to elide boxing?
I would guess that the JIT does not specifically target boxing operations, but relies on regular escape analysis to eliminate unnecessary boxing. Escape analysis is rather picky about data flow, and I suspect the problem is that some of your boxing operations hit the Integer
cache. Potentially pulling values out of the cache is probably what's preventing the boxing from being eliminated.
I've modified your test in two ways, and measured the results of each. The results seem to confirm my hypothesis.
I first tried rewriting your benchmarks to use boxed double
values instead of int
values, as double
boxing does not involve any caching.
I then went back to the int
-based benchmarks, but modified your loops to start at i = 128
so that none of your boxing operations ever hit the cache.
In both cases, the performance gap closed to within the margin of error.
To confirm, I enabled -XX:+PrintAssembly
to see how my modified benchmarks were getting compiled. For each pair of benchmarks, the boxing and primitive variants had nearly identical instruction sequences. There were only minor differences, e.g., a pair of instructions flipped. It definitely looked like the boxing was optimized away.
Workaround: Since bypassing the cache seems to avoid the issue, and there is no way force an empty integer cache, one workaround would be to replace implicit boxing with new Integer(i)
. Note, however, that if escape analysis doesn't replace the allocations (due to hitting one of the various compiler thresholds), then your performance may actually degrade.
Modified Benchmarks:
class IntArrayWrapper {
private int[] data = new int[100000];
void setBoxed(int i, Integer x) { data[i] = x; }
void setUnboxed(int i, int x) { data[i] = x; }
}
class DoubleArrayWrapper {
private double[] data = new double[100000];
void setBoxed(int i, Double x) { data[i] = x; }
void setUnboxed(int i, double x) { data[i] = x; }
}
@State(Scope.Benchmark)
public class BoxingBenchmarks {
@Benchmark
public void intBoxed() {
final IntArrayWrapper w = new IntArrayWrapper();
for (int i = 128; i < 100000; i++) w.setBoxed(i, i);
}
@Benchmark
public void intUnboxed() {
final IntArrayWrapper w = new IntArrayWrapper();
for (int i = 128; i < 100000; i++) w.setUnboxed(i, i);
}
@Benchmark
public void doubleBoxed() {
final DoubleArrayWrapper w = new DoubleArrayWrapper();
for (int i = 0; i < 100000; i++) w.setBoxed(i, (double) i);
}
@Benchmark
public void doubleUnboxed() {
final DoubleArrayWrapper w = new DoubleArrayWrapper();
for (int i = 0; i < 100000; i++) w.setUnboxed(i, (double) i);
}
}
Results:
Benchmark Mode Cnt Score Error Units
BoxingBenchmarks.doubleBoxed thrpt 5 6513.760 ± 1075.605 ops/s
BoxingBenchmarks.doubleUnboxed thrpt 5 6883.235 ± 414.803 ops/s
BoxingBenchmarks.intBoxed thrpt 5 10902.200 ± 315.437 ops/s
BoxingBenchmarks.intUnboxed thrpt 5 11148.648 ± 935.877 ops/s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With