I'm migrating a Spring 3.1.2 batch application from jdk1.7.0_25
to jdk1.7.0_40
both x64 and by Oracle.
Using Sun's OperatingSystemMXBean.getProcessCpuTime()
as a performance metric, the results show a 2.5x decrease in performance (i.e., my application running on u25 is much faster).
java.util.HashMap
and java.util.ArrayList
changes as results are the same when bootstrapping u40 with u25's HashMap and ArrayList classes and these changes are simply too minor for this kind of difference.-Xbatch
and -Xcomp
produces the same results (assuming server compilation is the same between these JDKs).java.lang.invoke.MethodHandles
but that seems unrelated. Unless Spring 3.1.2 makes use of them - which I could not find evidence of.javac
compilation seems unchanged as well.Some general notes:
jdk1.8.0
), while versions < u40 seem just fine (including various versions of JDK 6).+UseSerialGC
for testing).jdk1.7.0_40
.Any tips or ideas?
Edited to add: The structure of the application is an outer loop which runs financial monte carlo simulations: i.e. lots of dates, calculations, etc. As such, it's currently a bit complex and, I agree, not ideal for finding the issue. I'll have to try to scale it down.
It looks like the problem is due to work done in JDK-7133857, in which java.lang.Math.pow()
and java.lang.Math.exp()
were intrinsified and calculated using x87.
These methods are used extensively (!) in the profiled application and hence their considerable effect.
JDK-8029302 describes and fixes the issue for power of 2 inputs; and the testing of the application with jdk1.8.0_25
(in which the issue was fixed) shows improved performance, though not back to the higher level of jdk1.7.0_25
before intrinsification was done.
Here are my JMH benchmarks and their results for Math.pow()
on the three relevant JDK versions:
package org.sample;
import org.openjdk.jmh.annotations.*;
import java.lang.*;
public class MyBenchmark {
@State(Scope.Benchmark)
public static class ThreadState {
volatile double x = 0;
volatile double y = 0;
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public double powx(ThreadState state) {
state.x++;
state.y += 0.5;
return Math.pow(state.x, state.y);
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public double pow3(ThreadState state) {
state.x++;
return Math.pow(state.x, 3);
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public double pow2(ThreadState state) {
state.x++;
return Math.pow(state.x, 2);
}
}
The results:
Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
# VM invoker: x:\@sdks\jdks\jdk1.7.0_25\jre\bin\java.exe
...
Result: 4877658.355 (99.9%) 330460.323 ops/s [Average]
Statistics: (min, avg, max) = (1216417.493, 4877658.355, 6421780.276), stdev = 1399189.700
Confidence interval (99.9%): [4547198.032, 5208118.678]
# Run complete. Total time: 00:24:48
Benchmark Mode Samples Score Score error Units
o.s.MyBenchmark.pow2 thrpt 200 40160618.138 1561135.596 ops/s
o.s.MyBenchmark.pow3 thrpt 200 3678800.153 88678.269 ops/s
o.s.MyBenchmark.powx thrpt 200 4877658.355 330460.323 ops/s
# VM invoker: x:\@sdks\jdks\jdk1.7.0_40\jre\bin\java.exe
...
Result: 1860849.245 (99.9%) 94303.387 ops/s [Average]
Statistics: (min, avg, max) = (418909.582, 1860849.245, 2379936.035), stdev = 399286.444
Confidence interval (99.9%): [1766545.859, 1955152.632]
# Run complete. Total time: 00:24:48
Benchmark Mode Samples Score Score error Units
o.s.MyBenchmark.pow2 thrpt 200 9619333.987 230749.333 ops/s
o.s.MyBenchmark.pow3 thrpt 200 9240043.369 238456.949 ops/s
o.s.MyBenchmark.powx thrpt 200 1860849.245 94303.387 ops/s
# VM invoker: x:\@sdks\jdks\jdk1.8.0_25\jre\bin\java.exe
...
Result: 1898015.057 (99.9%) 92555.236 ops/s [Average]
Statistics: (min, avg, max) = (649562.297, 1898015.057, 2359474.902), stdev = 391884.665
Confidence interval (99.9%): [1805459.821, 1990570.293]
# Run complete. Total time: 00:24:37
Benchmark Mode Samples Score Score error Units
o.s.MyBenchmark.pow2 thrpt 200 81840274.815 1979190.065 ops/s
o.s.MyBenchmark.pow3 thrpt 200 9441518.686 206612.404 ops/s
o.s.MyBenchmark.powx thrpt 200 1898015.057 92555.236 ops/s
If I'm reading this right, the power of 2 issue was certainly fixed in JDK-8029302 and power of >2 ints (I just tested Math.pow(x, 3)
) performance was improved in jdk1.7.0_40
. As for the weird non-int Math.pow()s
as done above in the powx()
benchmark, there seems to still be a considerable performance regression (>3x) when moving from jdk1.7.0_25
to jdk1.7.0_40
.
Replacing Math.pow()
and Math.exp()
with their respective methods in org.apache.commons.math3.util.FastMath
completely solves the problem with an increase in performance - this is the correct solution as far as I'm concerned.
Note: This would've been somewhat simpler if there was an easy way (i.e. without the requirement of building the JDK) to set the -XX:-InlineIntrinsics
flag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With