I was recently asked about the performance of java 8 Optional. After some searching, I found this question and several blog posts, with contradicting answers. So I benchmarked it using JMH and I don't understand my findings.
Here is the gist of my benchmark code (full code is available on GitHub):
@State(Scope.Benchmark)
public class OptionalBenchmark {
private Room room;
@Param({ "empty", "small", "large", "full" })
private String filling;
@Setup
public void setUp () {
switch (filling) {
case "empty":
room = null;
break;
case "small":
room = new Room(new Flat(new Floor(null)));
break;
case "large":
room = new Room(new Flat(new Floor(new Building(new Block(new District(null))))));
break;
case "full":
room = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))));
break;
default:
throw new IllegalStateException("Unsupported filling.");
}
}
@Benchmark
public String nullChecks () {
if (room == null) {
return null;
}
Flat flat = room.getFlat();
if (flat == null) {
return null;
}
Floor floor = flat.getFloor();
if (floor == null) {
return null;
}
Building building = floor.getBuilding();
if (building == null) {
return null;
}
Block block = building.getBlock();
if (block == null) {
return null;
}
District district = block.getDistrict();
if (district == null) {
return null;
}
City city = district.getCity();
if (city == null) {
return null;
}
Country country = city.getCountry();
if (country == null) {
return null;
}
return country.getName();
}
@Benchmark
public String optionalsWithMethodRefs () {
return Optional.ofNullable (room)
.map (Room::getFlat)
.map (Flat::getFloor)
.map (Floor::getBuilding)
.map (Building::getBlock)
.map (Block::getDistrict)
.map (District::getCity)
.map (City::getCountry)
.map (Country::getName)
.orElse (null);
}
@Benchmark
public String optionalsWithLambdas () {
return Optional.ofNullable (room)
.map (room -> room.getFlat ())
.map (flat -> flat.getFloor ())
.map (floor -> floor.getBuilding ())
.map (building -> building.getBlock ())
.map (block -> block.getDistrict ())
.map (district -> district.getCity ())
.map (city -> city.getCountry ())
.map (country -> country.getName ())
.orElse (null);
}
}
And the results I got were:
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.nullChecks empty thrpt 200 468835378.093 ± 895576.864 ops/s
OptionalBenchmark.nullChecks small thrpt 200 306602013.907 ± 136966.520 ops/s
OptionalBenchmark.nullChecks large thrpt 200 259996142.619 ± 307584.215 ops/s
OptionalBenchmark.nullChecks full thrpt 200 275954974.981 ± 4154597.959 ops/s
OptionalBenchmark.optionalsWithLambdas empty thrpt 200 460491457.335 ± 322920.650 ops/s
OptionalBenchmark.optionalsWithLambdas small thrpt 200 98604468.453 ± 68320.074 ops/s
OptionalBenchmark.optionalsWithLambdas large thrpt 200 67648427.470 ± 206810.285 ops/s
OptionalBenchmark.optionalsWithLambdas full thrpt 200 167124820.392 ± 1229924.561 ops/s
OptionalBenchmark.optionalsWithMethodRefs empty thrpt 200 460690135.554 ± 273853.568 ops/s
OptionalBenchmark.optionalsWithMethodRefs small thrpt 200 98639064.680 ± 56848.805 ops/s
OptionalBenchmark.optionalsWithMethodRefs large thrpt 200 68138436.113 ± 158409.539 ops/s
OptionalBenchmark.optionalsWithMethodRefs full thrpt 200 169603006.971 ± 52646.423 ops/s
First of all, when given a null reference, Optional and null checks behave pretty much the same. I guess this is because there is only one instance of Optional.empty ()
, so any .map ()
method call on it just returns itself.
When the given object is non-null and contains a chain of non-null attributes, however, a new Optional has to be instantiated on each call to .map ()
. Hence, performance degrade much more quickly than with null checks. Makes sense. Expect for my full
filling, where the performance all of a sudden increase. So what is the magic going on here? Am I doing something wrong in my benchmark?
The parameters from my first run were the default from JMH: each benchmark was ran in 10 different forks, with 20 warmup iterations of 1s each, and then 20 measurement iterations of 1s each. I believe those value are sane, since I trust the libraries I use. However, since I was told I wasn’t warming up enough, here is the result of a longer test (200 warmup iterations and 200 measurement iteration for each of the 10 forks):
# JMH version: 1.19
# VM version: JDK 1.8.0_152, VM 25.152-b16
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 200 iterations, 1 s each
# Measurement: 200 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Run complete. Total time: 17:49:25
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.nullChecks empty thrpt 2000 471803721.972 ± 116120.114 ops/s
OptionalBenchmark.nullChecks small thrpt 2000 289181482.246 ± 3967502.916 ops/s
OptionalBenchmark.nullChecks large thrpt 2000 260222478.406 ± 105074.121 ops/s
OptionalBenchmark.nullChecks full thrpt 2000 282487728.710 ± 71214.637 ops/s
OptionalBenchmark.optionalsWithLambdas empty thrpt 2000 460931830.242 ± 335263.946 ops/s
OptionalBenchmark.optionalsWithLambdas small thrpt 2000 98688943.879 ± 20485.863 ops/s
OptionalBenchmark.optionalsWithLambdas large thrpt 2000 67262330.106 ± 50465.262 ops/s
OptionalBenchmark.optionalsWithLambdas full thrpt 2000 168070919.770 ± 352435.666 ops/s
OptionalBenchmark.optionalsWithMethodRefs empty thrpt 2000 460998599.579 ± 85063.337 ops/s
OptionalBenchmark.optionalsWithMethodRefs small thrpt 2000 98707338.408 ± 17231.648 ops/s
OptionalBenchmark.optionalsWithMethodRefs large thrpt 2000 68052673.021 ± 55285.427 ops/s
OptionalBenchmark.optionalsWithMethodRefs full thrpt 2000 169259067.479 ± 174402.212 ops/s
As you can see, we have almost the same figures.
Even such a powerful tool like JMH is not able to save from all benchmarking pitfalls. I've found two different issues with this benchmark.
HotSpot JIT compiler speculatively optimizes code basing on runtime profile. In the given "full" scenario Optional
never sees null
values. That's why Optional.ofNullable
method (also called by Optional.map
) happens to be optimized exclusively for non-null path which constructs a new non-empty Optional
. In this case JIT is able to eliminate all short-lived allocations and perform all map
operations without intermediate objects.
public static <T> Optional<T> ofNullable(T value) {
return value == null ? empty() : of(value);
}
In "small" and "large" scenarios the mapping sequence finally ends with Optional.empty()
. That is, both branches of ofNullable
method are compiled, and JIT is no longer able to eliminate allocations of intermediate Optional
objects - data flow graph appears to be too complex for Escape Analysis to succeed.
Check it by running JMH with -prof gc
, and you'll see that "small" allocates 48 bytes (3 Optionals) per iteration, "large" allocates 96 bytes (6 Optionals), and "full" allocates nothing.
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm empty avgt 5 ≈ 10⁻⁶ B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm small avgt 5 48,000 ± 0,001 B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm large avgt 5 96,000 ± 0,001 B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm full avgt 5 ≈ 10⁻⁵ B/op
If you replace new Country("France")
with new Country(null)
, the opimization will also break, and "full" scenario will become expectedly slower than "small" and "large".
Alternatively, the following dummy loop added to setUp
will also prevent from overoptimizing ofNullable
making the benchmark results more realistic.
for (int i = 0; i < 1000; i++) {
Optional.ofNullable(null);
}
Surprisingly, nullChecks
benchmark also appears faster in "full" scenario. The reason here is class initialization barriers. Note that only "full" case initializes all related classes. In "small" and "large" cases nullChecks
method refers to some classes that are not yet initialized. This prevents from compiling nullChecks
efficiently.
If you explicitly initialize all the classes in setUp
, e.g. by creating a dummy object, then "empty", "small" and "large" scenarios of nullChecks
will become faster.
Room dummy = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With