I'm trying to write a method like this:
static boolean fitsInDouble(long x) {
// return true if x can be represented
// as a numerically-equivalent double
}
And I'm trying to find the most efficient implementation. I settled on one, but then a coworker ran the benchmarks and got different relative results. The fastest implementation for me isn't the fastest for him.
Is there something wrong with these benchmarks?
package rnd;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.math.BigDecimal;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
public class Benchmarks {
public static void main(String[] args) throws Exception {
Options options = new OptionsBuilder()
.include(Benchmarks.class.getName())
.build();
new Runner(options).run();
}
@Benchmark
public void bigDecimal(Blackhole bh) {
for (long x : NUMBERS) bh.consume(bigDecimal(x));
}
@Benchmark
public void cast(Blackhole bh) {
for (long x : NUMBERS) bh.consume(cast(x));
}
@Benchmark
public void zeros(Blackhole bh) {
for (long x : NUMBERS) bh.consume(zeros(x));
}
public static boolean bigDecimal(long x) {
BigDecimal a = new BigDecimal(x);
BigDecimal b = new BigDecimal((double) x);
return a.compareTo(b) == 0;
}
public static boolean cast(long x) {
return x == (long) (double) x
&& x != Long.MAX_VALUE;
}
public static boolean zeros(long x) {
long a = Math.abs(x);
int z = Long.numberOfLeadingZeros(a);
return z > 10 || Long.numberOfTrailingZeros(a) > 10 - z;
}
private static final long[] NUMBERS = {
0,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
-1, -2, -3, -4, -5, -6, -7, -8, -9, -10,
123, 456, 789,
-123, -456, -789,
101112, 131415, 161718,
-101112, -131415, -161718,
11L,
222L,
3333L,
44444L,
555555L,
6666666L,
77777777L,
888888888L,
9999999999L,
1111L,
22222L,
333333L,
4444444L,
55555555L,
666666666L,
7777777777L,
88888888888L,
999999999999L,
11111111,
222222222,
3333333333L,
44444444444L,
555555555555L,
6666666666666L,
77777777777777L,
888888888888888L,
9999999999999999L,
Long.MAX_VALUE,
Long.MAX_VALUE - 1,
Long.MIN_VALUE,
Long.MIN_VALUE + 1,
(1L << 53),
(1l << 53) + 1,
(1l << 53) + 2,
(1l << 60),
(1l << 60) + 1,
(1l << 60) + 8,
(1l << 60) + 32,
(1l << 60) + 64,
(1l << 60) + 128,
(1l << 60) + 256,
(-1L << 53),
(-1L << 53) - 1,
(-1L << 53) - 2,
(-1l << 60),
(-1l << 60) - 1,
(-1l << 60) - 8,
(-1l << 60) - 32,
(-1l << 60) - 64,
(-1l << 60) - 128,
(-1l << 60) - 256
};
}
There are small differences in our environments.
Me: Windows 10, JDK 1.8.0_45, "zeros" is the fastest
Him: Windows 7, JDK 1.8.0_20, "cast" is the fastest
Our results are self-consistent from run to run, whether running in an IDE or from the command line. We're using JMH 1.10.5.
What is happening here? The benchmark seems untrustworthy and I don't know how to fix it.
JMH Benchmark ModesMeasures how long time it takes for the benchmark method to execute, including max, min time etc. Measures how long time a single benchmark method execution takes to run. This is good to test how it performs under a cold start (no JVM warm up).
JMH is popular for writing microbenchmarks, that is, benchmarks that stress a very specific piece of code. JMH also excels at concurrent benchmarks. That being said, JMH is a general-purpose benchmarking harness. It is useful for larger benchmarks, too.
Java Microbenchmark Harness or JMH is a tool for creating Java microbenchmarks.
I can reproduce different results even on the same machine with the same environment: sometimes cast
is slightly faster, sometimes zeros
is.
# JMH 1.10.5 (released 9 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_40\jre\bin\java.exe
# VM options: -Didea.launcher.port=7540 -Didea.launcher.bin.path=C:\Program Files (x86)\IDEA 14.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: bench.LongDouble.cast
# Run progress: 0,00% complete, ETA 00:01:20
# Fork: 1 of 5
# Warmup Iteration 1: 513,793 ns/op
# Warmup Iteration 2: 416,508 ns/op
# Warmup Iteration 3: 402,110 ns/op
Iteration 1: 402,535 ns/op
Iteration 2: 403,999 ns/op
Iteration 3: 404,871 ns/op
Iteration 4: 404,845 ns/op
Iteration 5: 401,705 ns/op
# Run progress: 10,00% complete, ETA 00:01:16
# Fork: 2 of 5
# Warmup Iteration 1: 421,552 ns/op
# Warmup Iteration 2: 418,925 ns/op
# Warmup Iteration 3: 421,813 ns/op
Iteration 1: 420,978 ns/op
Iteration 2: 422,940 ns/op
Iteration 3: 422,009 ns/op
Iteration 4: 423,011 ns/op
Iteration 5: 422,406 ns/op
# Run progress: 20,00% complete, ETA 00:01:07
# Fork: 3 of 5
# Warmup Iteration 1: 414,057 ns/op
# Warmup Iteration 2: 410,364 ns/op
# Warmup Iteration 3: 402,330 ns/op
Iteration 1: 402,776 ns/op
Iteration 2: 404,764 ns/op
Iteration 3: 400,346 ns/op
Iteration 4: 403,227 ns/op
Iteration 5: 403,350 ns/op
# Run progress: 30,00% complete, ETA 00:00:58
# Fork: 4 of 5
# Warmup Iteration 1: 422,161 ns/op
# Warmup Iteration 2: 419,118 ns/op
# Warmup Iteration 3: 402,990 ns/op
Iteration 1: 401,592 ns/op
Iteration 2: 402,999 ns/op
Iteration 3: 403,035 ns/op
Iteration 4: 402,625 ns/op
Iteration 5: 403,396 ns/op
# Run progress: 40,00% complete, ETA 00:00:50
# Fork: 5 of 5
# Warmup Iteration 1: 422,621 ns/op
# Warmup Iteration 2: 419,596 ns/op
# Warmup Iteration 3: 403,047 ns/op
Iteration 1: 403,438 ns/op
Iteration 2: 405,066 ns/op
Iteration 3: 403,271 ns/op
Iteration 4: 403,021 ns/op
Iteration 5: 402,162 ns/op
Result "cast":
406,975 ?(99.9%) 5,906 ns/op [Average]
(min, avg, max) = (400,346, 406,975, 423,011), stdev = 7,884
CI (99.9%): [401,069, 412,881] (assumes normal distribution)
# JMH 1.9.3 (released 114 days ago, please consider updating!)
# VM invoker: C:\Program Files\Java\jdk1.8.0_40\jre\bin\java.exe
# VM options: -Didea.launcher.port=7540 -Didea.launcher.bin.path=C:\Program Files (x86)\IDEA 14.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: bench.LongDouble.zeros
# Run progress: 50,00% complete, ETA 00:00:41
# Fork: 1 of 5
# Warmup Iteration 1: 439,529 ns/op
# Warmup Iteration 2: 437,752 ns/op
# Warmup Iteration 3: 390,530 ns/op
Iteration 1: 389,394 ns/op
Iteration 2: 391,453 ns/op
Iteration 3: 390,446 ns/op
Iteration 4: 390,822 ns/op
Iteration 5: 389,850 ns/op
# Run progress: 60,00% complete, ETA 00:00:33
# Fork: 2 of 5
# Warmup Iteration 1: 438,252 ns/op
# Warmup Iteration 2: 437,446 ns/op
# Warmup Iteration 3: 448,328 ns/op
Iteration 1: 389,979 ns/op
Iteration 2: 392,741 ns/op
Iteration 3: 390,575 ns/op
Iteration 4: 390,492 ns/op
Iteration 5: 390,000 ns/op
# Run progress: 70,00% complete, ETA 00:00:25
# Fork: 3 of 5
# Warmup Iteration 1: 447,939 ns/op
# Warmup Iteration 2: 444,489 ns/op
# Warmup Iteration 3: 414,433 ns/op
Iteration 1: 417,409 ns/op
Iteration 2: 413,518 ns/op
Iteration 3: 413,388 ns/op
Iteration 4: 414,040 ns/op
Iteration 5: 415,935 ns/op
# Run progress: 80,00% complete, ETA 00:00:16
# Fork: 4 of 5
# Warmup Iteration 1: 439,012 ns/op
# Warmup Iteration 2: 437,345 ns/op
# Warmup Iteration 3: 388,208 ns/op
Iteration 1: 395,647 ns/op
Iteration 2: 389,221 ns/op
Iteration 3: 387,539 ns/op
Iteration 4: 388,524 ns/op
Iteration 5: 387,623 ns/op
# Run progress: 90,00% complete, ETA 00:00:08
# Fork: 5 of 5
# Warmup Iteration 1: 446,116 ns/op
# Warmup Iteration 2: 446,622 ns/op
# Warmup Iteration 3: 409,116 ns/op
Iteration 1: 409,761 ns/op
Iteration 2: 410,146 ns/op
Iteration 3: 410,060 ns/op
Iteration 4: 409,370 ns/op
Iteration 5: 411,114 ns/op
Result "zeros":
399,162 ?(99.9%) 8,487 ns/op [Average]
(min, avg, max) = (387,539, 399,162, 417,409), stdev = 11,330
CI (99.9%): [390,675, 407,649] (assumes normal distribution)
# Run complete. Total time: 00:01:23
Benchmark Mode Cnt Score Error Units
LongDouble.cast avgt 25 406,975 ± 5,906 ns/op
LongDouble.zeros avgt 25 399,162 ± 8,487 ns/op
After some analysis I've found that the problem is not in the benchmark but rather in JMH. perfasm
profiler pointed to the Blackhole.consume
method:
public final void consume(boolean bool) {
boolean bool1 = this.bool1; // volatile read
boolean bool2 = this.bool2;
if (bool == bool1 & bool == bool2) {
// SHOULD NEVER HAPPEN
nullBait.bool1 = bool; // implicit null pointer exception
}
}
The interesting part is how bool1
and bool2
are initialized:
Random r = new Random(System.nanoTime());
...
bool1 = r.nextBoolean(); bool2 = !bool1;
Yes, they are random each time! As you know, JIT compiler relies on the run-time execution profile, so the generated code slightly varies depending on the initial values of bool1
and bool2
, particularly, in half cases it considers the branch taken, and in the rest half - untaken. That's where the difference comes from.
I've submitted the report against JMH with the suggested fix in case the authors confirm the defect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With