Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Java Optional performance increase with number of chained calls?

I was recently asked about the performance of java 8 Optional. After some searching, I found this question and several blog posts, with contradicting answers. So I benchmarked it using JMH and I don't understand my findings.

Here is the gist of my benchmark code (full code is available on GitHub):

@State(Scope.Benchmark)
public class OptionalBenchmark {

  private Room room;

  @Param({ "empty", "small", "large", "full" })
  private String filling;

  @Setup
  public void setUp () {
    switch (filling) {
      case "empty":
        room = null;
        break;
      case "small":
        room = new Room(new Flat(new Floor(null)));
        break;
      case "large":
        room = new Room(new Flat(new Floor(new Building(new Block(new District(null))))));
        break;
      case "full":
        room = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))));
        break;
      default:
        throw new IllegalStateException("Unsupported filling.");
    }
  }

  @Benchmark
  public String nullChecks () {
    if (room == null) {
      return null;
    }

    Flat flat = room.getFlat();
    if (flat == null) {
      return null;
    }

    Floor floor = flat.getFloor();
    if (floor == null) {
      return null;
    }

    Building building = floor.getBuilding();
    if (building == null) {
      return null;
    }

    Block block = building.getBlock();
    if (block == null) {
      return null;
    }

    District district = block.getDistrict();
    if (district == null) {
      return null;
    }

    City city = district.getCity();
    if (city == null) {
      return null;
    }

    Country country = city.getCountry();
    if (country == null) {
      return null;
    }

    return country.getName();
  }

  @Benchmark
  public String optionalsWithMethodRefs () {
    return Optional.ofNullable (room)
        .map (Room::getFlat)
        .map (Flat::getFloor)
        .map (Floor::getBuilding)
        .map (Building::getBlock)
        .map (Block::getDistrict)
        .map (District::getCity)
        .map (City::getCountry)
        .map (Country::getName)
        .orElse (null);
  }

  @Benchmark
  public String optionalsWithLambdas () {
    return Optional.ofNullable (room)
        .map (room -> room.getFlat ())
        .map (flat -> flat.getFloor ())
        .map (floor -> floor.getBuilding ())
        .map (building -> building.getBlock ())
        .map (block -> block.getDistrict ())
        .map (district -> district.getCity ())
        .map (city -> city.getCountry ())
        .map (country -> country.getName ())
        .orElse (null);
  }

}

And the results I got were:

Benchmark                                  (filling)   Mode  Cnt           Score         Error  Units
OptionalBenchmark.nullChecks                   empty  thrpt  200   468835378.093 ±  895576.864  ops/s
OptionalBenchmark.nullChecks                   small  thrpt  200   306602013.907 ±  136966.520  ops/s
OptionalBenchmark.nullChecks                   large  thrpt  200   259996142.619 ±  307584.215  ops/s
OptionalBenchmark.nullChecks                    full  thrpt  200   275954974.981 ± 4154597.959  ops/s
OptionalBenchmark.optionalsWithLambdas         empty  thrpt  200   460491457.335 ±  322920.650  ops/s
OptionalBenchmark.optionalsWithLambdas         small  thrpt  200    98604468.453 ±   68320.074  ops/s
OptionalBenchmark.optionalsWithLambdas         large  thrpt  200    67648427.470 ±  206810.285  ops/s
OptionalBenchmark.optionalsWithLambdas          full  thrpt  200   167124820.392 ± 1229924.561  ops/s
OptionalBenchmark.optionalsWithMethodRefs      empty  thrpt  200   460690135.554 ±  273853.568  ops/s
OptionalBenchmark.optionalsWithMethodRefs      small  thrpt  200    98639064.680 ±   56848.805  ops/s
OptionalBenchmark.optionalsWithMethodRefs      large  thrpt  200    68138436.113 ±  158409.539  ops/s
OptionalBenchmark.optionalsWithMethodRefs       full  thrpt  200   169603006.971 ±   52646.423  ops/s

First of all, when given a null reference, Optional and null checks behave pretty much the same. I guess this is because there is only one instance of Optional.empty (), so any .map () method call on it just returns itself.

When the given object is non-null and contains a chain of non-null attributes, however, a new Optional has to be instantiated on each call to .map (). Hence, performance degrade much more quickly than with null checks. Makes sense. Expect for my full filling, where the performance all of a sudden increase. So what is the magic going on here? Am I doing something wrong in my benchmark?

Edit

The parameters from my first run were the default from JMH: each benchmark was ran in 10 different forks, with 20 warmup iterations of 1s each, and then 20 measurement iterations of 1s each. I believe those value are sane, since I trust the libraries I use. However, since I was told I wasn’t warming up enough, here is the result of a longer test (200 warmup iterations and 200 measurement iteration for each of the 10 forks):

# JMH version: 1.19
# VM version: JDK 1.8.0_152, VM 25.152-b16
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 200 iterations, 1 s each
# Measurement: 200 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time

# Run complete. Total time: 17:49:25

Benchmark                                  (filling)   Mode   Cnt           Score         Error  Units
OptionalBenchmark.nullChecks                   empty  thrpt  2000   471803721.972 ±  116120.114  ops/s
OptionalBenchmark.nullChecks                   small  thrpt  2000   289181482.246 ± 3967502.916  ops/s
OptionalBenchmark.nullChecks                   large  thrpt  2000   260222478.406 ±  105074.121  ops/s
OptionalBenchmark.nullChecks                    full  thrpt  2000   282487728.710 ±   71214.637  ops/s
OptionalBenchmark.optionalsWithLambdas         empty  thrpt  2000   460931830.242 ±  335263.946  ops/s
OptionalBenchmark.optionalsWithLambdas         small  thrpt  2000    98688943.879 ±   20485.863  ops/s
OptionalBenchmark.optionalsWithLambdas         large  thrpt  2000    67262330.106 ±   50465.262  ops/s
OptionalBenchmark.optionalsWithLambdas          full  thrpt  2000   168070919.770 ±  352435.666  ops/s
OptionalBenchmark.optionalsWithMethodRefs      empty  thrpt  2000   460998599.579 ±   85063.337  ops/s
OptionalBenchmark.optionalsWithMethodRefs      small  thrpt  2000    98707338.408 ±   17231.648  ops/s
OptionalBenchmark.optionalsWithMethodRefs      large  thrpt  2000    68052673.021 ±   55285.427  ops/s
OptionalBenchmark.optionalsWithMethodRefs       full  thrpt  2000   169259067.479 ±  174402.212  ops/s

As you can see, we have almost the same figures.

like image 766
Étienne Miret Avatar asked Nov 13 '17 21:11

Étienne Miret


1 Answers

Even such a powerful tool like JMH is not able to save from all benchmarking pitfalls. I've found two different issues with this benchmark.

1.

HotSpot JIT compiler speculatively optimizes code basing on runtime profile. In the given "full" scenario Optional never sees null values. That's why Optional.ofNullable method (also called by Optional.map) happens to be optimized exclusively for non-null path which constructs a new non-empty Optional. In this case JIT is able to eliminate all short-lived allocations and perform all map operations without intermediate objects.

public static <T> Optional<T> ofNullable(T value) {
    return value == null ? empty() : of(value);
}

In "small" and "large" scenarios the mapping sequence finally ends with Optional.empty(). That is, both branches of ofNullable method are compiled, and JIT is no longer able to eliminate allocations of intermediate Optional objects - data flow graph appears to be too complex for Escape Analysis to succeed.

Check it by running JMH with -prof gc, and you'll see that "small" allocates 48 bytes (3 Optionals) per iteration, "large" allocates 96 bytes (6 Optionals), and "full" allocates nothing.

Benchmark                                                      (filling)  Mode  Cnt     Score     Error   Units
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      empty  avgt    5    ≈ 10⁻⁶              B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      small  avgt    5    48,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      large  avgt    5    96,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm       full  avgt    5    ≈ 10⁻⁵              B/op

If you replace new Country("France") with new Country(null), the opimization will also break, and "full" scenario will become expectedly slower than "small" and "large".

Alternatively, the following dummy loop added to setUp will also prevent from overoptimizing ofNullable making the benchmark results more realistic.

    for (int i = 0; i < 1000; i++) {
        Optional.ofNullable(null);
    }

2.

Surprisingly, nullChecks benchmark also appears faster in "full" scenario. The reason here is class initialization barriers. Note that only "full" case initializes all related classes. In "small" and "large" cases nullChecks method refers to some classes that are not yet initialized. This prevents from compiling nullChecks efficiently.

If you explicitly initialize all the classes in setUp, e.g. by creating a dummy object, then "empty", "small" and "large" scenarios of nullChecks will become faster.

Room dummy = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))))
like image 154
apangin Avatar answered Oct 30 '22 11:10

apangin