Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Odd Performance (Boxed Integers)

I've run into a case where I thought the JIT should have an easy time optimising, but it does not seem to.

I've reduced the problem to a minimal example:

Consider a class IntArrayWrapper:

class IntArrayWrapper {
    private int[] data = new int[100000];
    public void setInteger(int i, Integer x) { data[i] = x; }
    public void setInt    (int i, int x)     { data[i] = x; }
}

The only difference between the two methods is that x is an Integer (boxed) or an int (primitive).

I've written some JMH benchmarks to measure the performance of these two methods:

@Benchmark
public void bmarkSetIntConst() {
    final IntArrayWrapper w = new IntArrayWrapper();
    for (int i = 0; i < 100000; i++) {
        w.setInt(i, 100);
    }
}

@Benchmark
public void bmarkSetIntStair() {
    final IntArrayWrapper w = new IntArrayWrapper();
    for (int i = 0; i < 100000; i++) {
        w.setInt(i, i);
    }
}

// omitted: bmarkSetIntegerConst and bmarkSetIntStair that use .setInteger(..)

Expected Results

What I expected to see was:

  • setIntegerConst equal to setIntConst. THIS IS TRUE.
  • setIntegerStair equal to setIntStair. THIS IS NOT TRUE.

The reason I thought was that I think that the JIT should inline the setInteger call, and realise that there is an auto-boxing operation (from the call) directly followed by an unboxing operation (from the array assignment) and therefore be able to remove the boxing/unboxing.

This does not seem to be the case.

Some observations

  • The const operations perform equally well, I think this is due to the fact that the hard-coded integer is cached.
  • It's strange that setIntStair and setIntConst have different performance, I have a feeling that the JIT might produce SIMD code here, but I'd be grateful for any insights.

Results

These are the results, the whole code is here: https://gist.github.com/kaeluka/fe1210074038424c30db7a52ac5c2d7b

Benchmark                             Mode  Cnt      Score     Error  Units
MyBenchmark.bmarkSetIntConst         thrpt   20  15717.814 ± 362.137  ops/s
MyBenchmark.bmarkSetIntegerConst     thrpt   20  15814.296 ± 657.945  ops/s
MyBenchmark.bmarkSetIntStair         thrpt   20  11941.879 ± 200.335  ops/s
MyBenchmark.bmarkSetIntegerStair     thrpt   20   2981.398 ±  48.806  ops/s
MyBenchmark.bmarkSetIntSawtooth      thrpt   20  11072.882 ± 234.686  ops/s
MyBenchmark.bmarkSetIntegerSawtooth  thrpt   20  11105.272 ± 156.496  ops/s

Questions

  • Why is the JIT not able to elide boxing?
  • Is there a way to fix this without changing the interface of setInteger to take an int? (My original code uses generics, so int is not an option unless I want to duplicate a lot of code).

Edits

Added results for bmarkSetIntegerSawtooth and bmarkSetIntSawtooth that set the value to i % 128 to measure the impact of object pooling for Integers.

like image 502
Stephan Brandauer Avatar asked Apr 26 '18 14:04

Stephan Brandauer


1 Answers

Why is the JIT not able to elide boxing?

I would guess that the JIT does not specifically target boxing operations, but relies on regular escape analysis to eliminate unnecessary boxing. Escape analysis is rather picky about data flow, and I suspect the problem is that some of your boxing operations hit the Integer cache. Potentially pulling values out of the cache is probably what's preventing the boxing from being eliminated.

I've modified your test in two ways, and measured the results of each. The results seem to confirm my hypothesis.

  1. I first tried rewriting your benchmarks to use boxed double values instead of int values, as double boxing does not involve any caching.

  2. I then went back to the int-based benchmarks, but modified your loops to start at i = 128 so that none of your boxing operations ever hit the cache.

In both cases, the performance gap closed to within the margin of error.

To confirm, I enabled -XX:+PrintAssembly to see how my modified benchmarks were getting compiled. For each pair of benchmarks, the boxing and primitive variants had nearly identical instruction sequences. There were only minor differences, e.g., a pair of instructions flipped. It definitely looked like the boxing was optimized away.

Workaround: Since bypassing the cache seems to avoid the issue, and there is no way force an empty integer cache, one workaround would be to replace implicit boxing with new Integer(i). Note, however, that if escape analysis doesn't replace the allocations (due to hitting one of the various compiler thresholds), then your performance may actually degrade.


Modified Benchmarks:

class IntArrayWrapper {
    private int[] data = new int[100000];
    void setBoxed(int i, Integer x) { data[i] = x; }
    void setUnboxed(int i, int x) { data[i] = x; }
}

class DoubleArrayWrapper {
    private double[] data = new double[100000];
    void setBoxed(int i, Double x) { data[i] = x; }
    void setUnboxed(int i, double x) { data[i] = x; }
}

@State(Scope.Benchmark)
public class BoxingBenchmarks {
    @Benchmark
    public void intBoxed() {
        final IntArrayWrapper w = new IntArrayWrapper();
        for (int i = 128; i < 100000; i++) w.setBoxed(i, i);
    }

    @Benchmark
    public void intUnboxed() {
        final IntArrayWrapper w = new IntArrayWrapper();
        for (int i = 128; i < 100000; i++) w.setUnboxed(i, i);
    }

    @Benchmark
    public void doubleBoxed() {
        final DoubleArrayWrapper w = new DoubleArrayWrapper();
        for (int i = 0; i < 100000; i++) w.setBoxed(i, (double) i);
    }

    @Benchmark
    public void doubleUnboxed() {
        final DoubleArrayWrapper w = new DoubleArrayWrapper();
        for (int i = 0; i < 100000; i++) w.setUnboxed(i, (double) i);
    }
}

Results:

Benchmark                        Mode  Cnt      Score      Error  Units
BoxingBenchmarks.doubleBoxed    thrpt    5   6513.760 ± 1075.605  ops/s
BoxingBenchmarks.doubleUnboxed  thrpt    5   6883.235 ±  414.803  ops/s
BoxingBenchmarks.intBoxed       thrpt    5  10902.200 ±  315.437  ops/s
BoxingBenchmarks.intUnboxed     thrpt    5  11148.648 ±  935.877  ops/s
like image 110
Mike Strobel Avatar answered Oct 28 '22 02:10

Mike Strobel