This JMH benchmark is inconsistent across machines - why?

Tags:

I'm trying to write a method like this:

static boolean fitsInDouble(long x) {
  // return true if x can be represented
  // as a numerically-equivalent double
}

And I'm trying to find the most efficient implementation. I settled on one, but then a coworker ran the benchmarks and got different relative results. The fastest implementation for me isn't the fastest for him.

Is there something wrong with these benchmarks?

package rnd;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.math.BigDecimal;
import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
public class Benchmarks {

  public static void main(String[] args) throws Exception {
    Options options = new OptionsBuilder()
        .include(Benchmarks.class.getName())
        .build();
    new Runner(options).run();
  }

  @Benchmark
  public void bigDecimal(Blackhole bh) {
    for (long x : NUMBERS) bh.consume(bigDecimal(x));
  }

  @Benchmark
  public void cast(Blackhole bh) {
    for (long x : NUMBERS) bh.consume(cast(x));
  }

  @Benchmark
  public void zeros(Blackhole bh) {
    for (long x : NUMBERS) bh.consume(zeros(x));
  }

  public static boolean bigDecimal(long x) {
    BigDecimal a = new BigDecimal(x);
    BigDecimal b = new BigDecimal((double) x);
    return a.compareTo(b) == 0;
  }

  public static boolean cast(long x) {
    return x == (long) (double) x
        && x != Long.MAX_VALUE;
  }

  public static boolean zeros(long x) {
    long a = Math.abs(x);
    int z = Long.numberOfLeadingZeros(a);
    return z > 10 || Long.numberOfTrailingZeros(a) > 10 - z;
  }

  private static final long[] NUMBERS = {
      0,
      1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
      -1, -2, -3, -4, -5, -6, -7, -8, -9, -10,
      123, 456, 789,
      -123, -456, -789,
      101112, 131415, 161718,
      -101112, -131415, -161718,
      11L,
      222L,
      3333L,
      44444L,
      555555L,
      6666666L,
      77777777L,
      888888888L,
      9999999999L,
      1111L,
      22222L,
      333333L,
      4444444L,
      55555555L,
      666666666L,
      7777777777L,
      88888888888L,
      999999999999L,
      11111111,
      222222222,
      3333333333L,
      44444444444L,
      555555555555L,
      6666666666666L,
      77777777777777L,
      888888888888888L,
      9999999999999999L,
      Long.MAX_VALUE,
      Long.MAX_VALUE - 1,
      Long.MIN_VALUE,
      Long.MIN_VALUE + 1,
      (1L << 53),
      (1l << 53) + 1,
      (1l << 53) + 2,
      (1l << 60),
      (1l << 60) + 1,
      (1l << 60) + 8,
      (1l << 60) + 32,
      (1l << 60) + 64,
      (1l << 60) + 128,
      (1l << 60) + 256,
      (-1L << 53),
      (-1L << 53) - 1,
      (-1L << 53) - 2,
      (-1l << 60),
      (-1l << 60) - 1,
      (-1l << 60) - 8,
      (-1l << 60) - 32,
      (-1l << 60) - 64,
      (-1l << 60) - 128,
      (-1l << 60) - 256
  };
}

There are small differences in our environments.

Me: Windows 10, JDK 1.8.0_45, "zeros" is the fastest

Him: Windows 7, JDK 1.8.0_20, "cast" is the fastest

Our results are self-consistent from run to run, whether running in an IDE or from the command line. We're using JMH 1.10.5.

What is happening here? The benchmark seems untrustworthy and I don't know how to fix it.

789

asked Sep 05 '15 08:09

Michael Hixson

1 Answers

I can reproduce different results even on the same machine with the same environment: sometimes cast is slightly faster, sometimes zeros is.

# JMH 1.10.5 (released 9 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_40\jre\bin\java.exe
# VM options: -Didea.launcher.port=7540 -Didea.launcher.bin.path=C:\Program Files (x86)\IDEA 14.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: bench.LongDouble.cast

# Run progress: 0,00% complete, ETA 00:01:20
# Fork: 1 of 5
# Warmup Iteration   1: 513,793 ns/op
# Warmup Iteration   2: 416,508 ns/op
# Warmup Iteration   3: 402,110 ns/op
Iteration   1: 402,535 ns/op
Iteration   2: 403,999 ns/op
Iteration   3: 404,871 ns/op
Iteration   4: 404,845 ns/op
Iteration   5: 401,705 ns/op

# Run progress: 10,00% complete, ETA 00:01:16
# Fork: 2 of 5
# Warmup Iteration   1: 421,552 ns/op
# Warmup Iteration   2: 418,925 ns/op
# Warmup Iteration   3: 421,813 ns/op
Iteration   1: 420,978 ns/op
Iteration   2: 422,940 ns/op
Iteration   3: 422,009 ns/op
Iteration   4: 423,011 ns/op
Iteration   5: 422,406 ns/op

# Run progress: 20,00% complete, ETA 00:01:07
# Fork: 3 of 5
# Warmup Iteration   1: 414,057 ns/op
# Warmup Iteration   2: 410,364 ns/op
# Warmup Iteration   3: 402,330 ns/op
Iteration   1: 402,776 ns/op
Iteration   2: 404,764 ns/op
Iteration   3: 400,346 ns/op
Iteration   4: 403,227 ns/op
Iteration   5: 403,350 ns/op

# Run progress: 30,00% complete, ETA 00:00:58
# Fork: 4 of 5
# Warmup Iteration   1: 422,161 ns/op
# Warmup Iteration   2: 419,118 ns/op
# Warmup Iteration   3: 402,990 ns/op
Iteration   1: 401,592 ns/op
Iteration   2: 402,999 ns/op
Iteration   3: 403,035 ns/op
Iteration   4: 402,625 ns/op
Iteration   5: 403,396 ns/op

# Run progress: 40,00% complete, ETA 00:00:50
# Fork: 5 of 5
# Warmup Iteration   1: 422,621 ns/op
# Warmup Iteration   2: 419,596 ns/op
# Warmup Iteration   3: 403,047 ns/op
Iteration   1: 403,438 ns/op
Iteration   2: 405,066 ns/op
Iteration   3: 403,271 ns/op
Iteration   4: 403,021 ns/op
Iteration   5: 402,162 ns/op


Result "cast":
  406,975 ?(99.9%) 5,906 ns/op [Average]
  (min, avg, max) = (400,346, 406,975, 423,011), stdev = 7,884
  CI (99.9%): [401,069, 412,881] (assumes normal distribution)


# JMH 1.9.3 (released 114 days ago, please consider updating!)
# VM invoker: C:\Program Files\Java\jdk1.8.0_40\jre\bin\java.exe
# VM options: -Didea.launcher.port=7540 -Didea.launcher.bin.path=C:\Program Files (x86)\IDEA 14.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: bench.LongDouble.zeros

# Run progress: 50,00% complete, ETA 00:00:41
# Fork: 1 of 5
# Warmup Iteration   1: 439,529 ns/op
# Warmup Iteration   2: 437,752 ns/op
# Warmup Iteration   3: 390,530 ns/op
Iteration   1: 389,394 ns/op
Iteration   2: 391,453 ns/op
Iteration   3: 390,446 ns/op
Iteration   4: 390,822 ns/op
Iteration   5: 389,850 ns/op

# Run progress: 60,00% complete, ETA 00:00:33
# Fork: 2 of 5
# Warmup Iteration   1: 438,252 ns/op
# Warmup Iteration   2: 437,446 ns/op
# Warmup Iteration   3: 448,328 ns/op
Iteration   1: 389,979 ns/op
Iteration   2: 392,741 ns/op
Iteration   3: 390,575 ns/op
Iteration   4: 390,492 ns/op
Iteration   5: 390,000 ns/op

# Run progress: 70,00% complete, ETA 00:00:25
# Fork: 3 of 5
# Warmup Iteration   1: 447,939 ns/op
# Warmup Iteration   2: 444,489 ns/op
# Warmup Iteration   3: 414,433 ns/op
Iteration   1: 417,409 ns/op
Iteration   2: 413,518 ns/op
Iteration   3: 413,388 ns/op
Iteration   4: 414,040 ns/op
Iteration   5: 415,935 ns/op

# Run progress: 80,00% complete, ETA 00:00:16
# Fork: 4 of 5
# Warmup Iteration   1: 439,012 ns/op
# Warmup Iteration   2: 437,345 ns/op
# Warmup Iteration   3: 388,208 ns/op
Iteration   1: 395,647 ns/op
Iteration   2: 389,221 ns/op
Iteration   3: 387,539 ns/op
Iteration   4: 388,524 ns/op
Iteration   5: 387,623 ns/op

# Run progress: 90,00% complete, ETA 00:00:08
# Fork: 5 of 5
# Warmup Iteration   1: 446,116 ns/op
# Warmup Iteration   2: 446,622 ns/op
# Warmup Iteration   3: 409,116 ns/op
Iteration   1: 409,761 ns/op
Iteration   2: 410,146 ns/op
Iteration   3: 410,060 ns/op
Iteration   4: 409,370 ns/op
Iteration   5: 411,114 ns/op


Result "zeros":
  399,162 ?(99.9%) 8,487 ns/op [Average]
  (min, avg, max) = (387,539, 399,162, 417,409), stdev = 11,330
  CI (99.9%): [390,675, 407,649] (assumes normal distribution)


# Run complete. Total time: 00:01:23

Benchmark         Mode  Cnt    Score   Error  Units
LongDouble.cast   avgt   25  406,975 ± 5,906  ns/op
LongDouble.zeros  avgt   25  399,162 ± 8,487  ns/op

After some analysis I've found that the problem is not in the benchmark but rather in JMH. perfasm profiler pointed to the Blackhole.consume method:

public final void consume(boolean bool) {
    boolean bool1 = this.bool1; // volatile read
    boolean bool2 = this.bool2;
    if (bool == bool1 & bool == bool2) {
        // SHOULD NEVER HAPPEN
        nullBait.bool1 = bool; // implicit null pointer exception
    }
}

The interesting part is how bool1 and bool2 are initialized:

Random r = new Random(System.nanoTime());
...
bool1 = r.nextBoolean(); bool2 = !bool1;

Yes, they are random each time! As you know, JIT compiler relies on the run-time execution profile, so the generated code slightly varies depending on the initial values of bool1 and bool2, particularly, in half cases it considers the branch taken, and in the rest half - untaken. That's where the difference comes from.

I've submitted the report against JMH with the suggested fix in case the authors confirm the defect.

answered Oct 16 '22 16:10

apangin

Related questions
                            
                                Compile NetBeans project from command line by using Ant
                            
                                ArrayList Removing first element
                            
                                JPA getting repeated rows (results all have the same values)
                            
                                What is the replacement class for SerializerBase in FasterXML Jackson 2?
                            
                                Generic method to perform a map-reduce operation. (Java-8)
                            
                                Sharing data between multiple java threads and get the updated value
                            
                                Checking objects for identity and Sonar issues
                            
                                What is the proper way to validate requests with Resteasy?
                            
                                Does it make sense to Unittest wrapper methods
                            
                                javafx application name on gnome
                            
                                Stuck with lambda expression and Map
                            
                                How to fetch associated entities with Hibernate
                            
                                How to read file line by line in Java 8?
                            
                                return in try block vs return after the block
                            
                                What is the performance penalty of enabling stats on Guava Cache objects?
                            
                                HashMap with byte array key and String value - containsKey() function doesn't work
                            
                                Mocking the UIInput class for tests
                            
                                Android: ActivityCompat.requestPermissions requires activity and not context :/
                            
                                Do I have to bother about closing Realm database in Objective-C and Swift?
                            
                                Gradle equivalent of Surefire classpathDependencyExclude

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

This JMH benchmark is inconsistent across machines - why?

Tags:

java

performance

jmh

Michael Hixson

People also ask

1 Answers

apangin

Recent Activity

Donate For Us