A few hours ago I answered another Stack Overflow question, and it gave a very surprising result. The answer can be found here. The answer was/is partially wrong, however I feel focused on byte addition.
Strictly, it actually is byte-to-long addition.
This is the benchmark code I have been using:
public class ByteAdditionBenchmark {
private void start() {
int[] sizes = {
700_000,
1_000,
10_000,
25_000,
50_000,
100_000,
200_000,
300_000,
400_000,
500_000,
600_000,
700_000,
};
for (int size : sizes) {
List<byte[]> arrays = createByteArrays(size);
//Warmup
arrays.forEach(this::byteArrayCheck);
benchmark(arrays, this::byteArrayCheck, "byteArrayCheck");
}
}
private void benchmark(final List<byte[]> arrays, final Consumer<byte[]> method, final String name) {
long start = System.nanoTime();
arrays.forEach(method);
long end = System.nanoTime();
double nanosecondsPerIteration = (end - start) * 1d / arrays.size();
System.out.println("Benchmark: " + name + " / iterations: " + arrays.size() + " / time per iteration: " + nanosecondsPerIteration + " ns");
}
private List<byte[]> createByteArrays(final int amount) {
Random random = new Random();
List<byte[]> resultList = new ArrayList<>();
for (int i = 0; i < amount; i++) {
byte[] byteArray = new byte[4096];
byteArray[random.nextInt(4096)] = 1;
resultList.add(byteArray);
}
return resultList;
}
private boolean byteArrayCheck(final byte[] array) {
long sum = 0L;
for (byte b : array) {
sum += b;
}
return (sum == 0);
}
public static void main(String[] args) {
new ByteAdditionBenchmark().start();
}
}
And this are the results I have been getting:
Benchmark: byteArrayCheck / iterations: 700000 / time per iteration: 50.26538857142857 ns
Benchmark: byteArrayCheck / iterations: 1000 / time per iteration: 20.12 ns
Benchmark: byteArrayCheck / iterations: 10000 / time per iteration: 9.1289 ns
Benchmark: byteArrayCheck / iterations: 25000 / time per iteration: 10.02972 ns
Benchmark: byteArrayCheck / iterations: 50000 / time per iteration: 9.04478 ns
Benchmark: byteArrayCheck / iterations: 100000 / time per iteration: 18.44992 ns
Benchmark: byteArrayCheck / iterations: 200000 / time per iteration: 15.48304 ns
Benchmark: byteArrayCheck / iterations: 300000 / time per iteration: 15.806353333333334 ns
Benchmark: byteArrayCheck / iterations: 400000 / time per iteration: 16.923685 ns
Benchmark: byteArrayCheck / iterations: 500000 / time per iteration: 16.131066 ns
Benchmark: byteArrayCheck / iterations: 600000 / time per iteration: 16.435461666666665 ns
Benchmark: byteArrayCheck / iterations: 700000 / time per iteration: 17.107615714285714 ns
To my knowledge, the JVM is already fully warmed up after the first 700000 iterations before it starts to spit out benchmarking data.
How can it then be that, despite the warmup, the performance is still unpredictable? As almost directly after the warmup byte addition gets blazingly fast, but after that it seems to converge again to a nominal 16 ns per addition again.
The tests have been ran on a PC with an Intel i7 3770 stock-clocked and 16 GB of RAM, hence I can't go much beyond 700000 iterations. It is running on Windows 8.1 64-bit if that matters.
It turns out that the JIT was optimizing everything away, as per raphw's suggestion.
Therefore I replaced the benchmark method with the following:
private void benchmark(final List<byte[]> arrays, final Predicate<byte[]> method, final String name) {
long start = System.nanoTime();
boolean someUnrelatedResult = false;
for (byte[] array : arrays) {
someUnrelatedResult |= method.test(array);
}
long end = System.nanoTime();
double nanosecondsPerIteration = (end - start) * 1d / arrays.size();
System.out.println("Result: " + someUnrelatedResult);
System.out.println("Benchmark: " + name + " / iterations: " + arrays.size() + " / time per iteration: " + nanosecondsPerIteration + "ns");
}
This will ensure that it cannot be optimized away and test results also show it (omitted the result print for clarity):
Benchmark: byteArrayCheck / iterations: 700000 / time per iteration: 1658.2627914285715 ns
Benchmark: byteArrayCheck / iterations: 1000 / time per iteration: 1241.706 ns
Benchmark: byteArrayCheck / iterations: 10000 / time per iteration: 1215.941 ns
Benchmark: byteArrayCheck / iterations: 25000 / time per iteration: 1332.94656 ns
Benchmark: byteArrayCheck / iterations: 50000 / time per iteration: 1456.0361 ns
Benchmark: byteArrayCheck / iterations: 100000 / time per iteration: 1753.26777 ns
Benchmark: byteArrayCheck / iterations: 200000 / time per iteration: 1756.93283 ns
Benchmark: byteArrayCheck / iterations: 300000 / time per iteration: 1762.9992266666666 ns
Benchmark: byteArrayCheck / iterations: 400000 / time per iteration: 1806.854815 ns
Benchmark: byteArrayCheck / iterations: 500000 / time per iteration: 1784.09091 ns
Benchmark: byteArrayCheck / iterations: 600000 / time per iteration: 1804.6096366666666 ns
Benchmark: byteArrayCheck / iterations: 700000 / time per iteration: 1811.0597585714286 ns
I'd say that these results look a lot more convincing with respect to the computation time. However, my question still stands. With repeated tests at random times the same pattern remains that benchmarks with a low number of iterations are faster than those with more iterations, though they do seem to stabilize at 100,000 iterations or somewhere lower.
What is the explanation?
The reason for your result is that you do not actually know what you are measuring. Java's just-in-time compiler is most certainly having a look at your code and it might just happen that you are measuring nothing.
The compiler is smart enough to figure out that your List<byte[]>
is not actually used for anything. Therefore, it will eventually remove all the related code from your running application. Thus, your benchmark is most likely measuring an increasingly empty application.
The answer to all such questions is always: It is not worth having a discussion before we actually looked at a valid benchmark. Benchmarking harnesses such as the JMH (which I can recommend) know a concept called a black hole. Black holes are meant to confuse the just-in-time compiler in order to think that a computed value is actually used for something even though it is not. With such black holes, otherwise code that is erased as no-op will remain.
Another typical problem of home-grown benchmarks are optimized loops. Again, the just-in-time compiler will notice that the loop results in the same computation for any iteration and will therefore remove the loop altogether. With a (quality) benchmarking harness, you will only suggest a number of loops to run instead of hard-coding them. This way, the harness can take care of tricking the compiler.
Write a benchmark with JMH, you will see that your measured times will be quite different.
Concerning your update: I can only repeat myself. Never trust an unharnessed benchmark! An easy way to find out about what the JVM is doing to your code is running JITwatch. The main concern with your benchmark is that it ignores the JVM's profiling. A profile is an attempt of the JVM to remember properties of your code which it then bases its optimization on. For your benchmark, you mix the profiles of different runs together. The JVM then has to renew its current profile and recompile the byte code on the fly what costs time.
In order to avoid this issue, a harnesses like JMH lets you fork a JVM new process for each benchmark. Here is what I am measuring with a harnessed benchmark:
Benchmark Mode Samples Mean Mean error Units
o.s.MyBenchmark.test100k avgt 20 1922.671 29.155 ns/op
o.s.MyBenchmark.test10k avgt 20 1911.152 13.217 ns/op
o.s.MyBenchmark.test1k avgt 20 1857.205 3.086 ns/op
o.s.MyBenchmark.test200k avgt 20 1905.360 18.102 ns/op
o.s.MyBenchmark.test25k avgt 20 1832.663 102.562 ns/op
o.s.MyBenchmark.test50k avgt 20 1907.488 18.043 ns/op
And here is the source code to the benchmark based on the mentioned JMH:
@State(Scope.Benchmark)
public class MyBenchmark {
private List<byte[]> input1k, input10k, input25k, input50k, input100k, input200k;
@Setup
public void setUp() {
input1k = createByteArray(1_000);
input10k = createByteArray(10_000);
input25k = createByteArray(25_000);
input50k = createByteArray(50_000);
input100k = createByteArray(100_000);
input200k = createByteArray(200_000);
}
private static List<byte[]> createByteArray(int length) {
Random random = new Random();
List<byte[]> resultList = new ArrayList<>();
for (int i = 0; i < length; i++) {
byte[] byteArray = new byte[4096];
byteArray[random.nextInt(4096)] = 1;
resultList.add(byteArray);
}
return resultList;
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(1_000)
public boolean test1k() {
return runBenchmark(input1k, this::byteArrayCheck);
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(10_000)
public boolean test10k() {
return runBenchmark(input10k, this::byteArrayCheck);
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(25_000)
public boolean test25k() {
return runBenchmark(input25k, this::byteArrayCheck);
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(50_000)
public boolean test50k() {
return runBenchmark(input50k, this::byteArrayCheck);
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(100_000)
public boolean test100k() {
return runBenchmark(input100k, this::byteArrayCheck);
}
@GenerateMicroBenchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(200_000)
public boolean test200k() {
return runBenchmark(input200k, this::byteArrayCheck);
}
private static boolean runBenchmark(List<byte[]> arrays, Predicate<byte[]> method) {
boolean someUnrelatedResult = false;
for (byte[] array : arrays) {
someUnrelatedResult |= method.test(array);
}
return someUnrelatedResult;
}
private boolean byteArrayCheck(final byte[] array) {
long sum = 0L;
for (byte b : array) {
sum += b;
}
return (sum == 0);
}
public static void main(String[] args) throws RunnerException {
new Runner(new OptionsBuilder()
.include(".*" + MyBenchmark.class.getSimpleName() + ".*")
.forks(1)
.build()).run();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With