Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance regression when migrating from jdk1.7.0_25 to jdk1.7.0_40

I'm migrating a Spring 3.1.2 batch application from jdk1.7.0_25 to jdk1.7.0_40 both x64 and by Oracle.

Using Sun's OperatingSystemMXBean.getProcessCpuTime() as a performance metric, the results show a 2.5x decrease in performance (i.e., my application running on u25 is much faster).

  • As far as I can tell, this is not due to the java.util.HashMap and java.util.ArrayList changes as results are the same when bootstrapping u40 with u25's HashMap and ArrayList classes and these changes are simply too minor for this kind of difference.
  • Nor is this related to the HashMap concurrency regression as the application is single threaded and the regression was fixed in u40..
  • Hotswap optimizations don't seem to be the issue either, as running with -Xbatch and -Xcomp produces the same results (assuming server compilation is the same between these JDKs).
  • There was a performance regression regarding java.lang.invoke.MethodHandles but that seems unrelated. Unless Spring 3.1.2 makes use of them - which I could not find evidence of.
  • javac compilation seems unchanged as well.

Some general notes:

  • This issue appears for every JDK 7 version >= u40 (as well as the latest JDK 8 jdk1.8.0), while versions < u40 seem just fine (including various versions of JDK 6).
  • Plain old java code (for example, running 1000*1000*1000*some_calc) does not have this performance issue, meaning that somehow my code or used libraries are doing something odd and unexpected?
  • Tests were done using the same database instance (MSSQL 2008 R2), not that it should matter.
  • Even if OperatingSystemMXBean is unreliable, the tests' wall time is just as different.
  • In both cases GC seems to initiate at the same times and for the same durations (I've been using +UseSerialGC for testing).
  • Profiling shows no unusual new hotspots, though it's generally showing an application wide increase in execution time.
  • Testing the x86 versions of these Sun JDKs or OpenJDK versions (I've used these) does not change the result.
  • All code tested (except when running on JDK 6) was compiled using jdk1.7.0_40.
  • The same scenario has been tested on two different computers: x64 and x86.

Any tips or ideas?

Edited to add: The structure of the application is an outer loop which runs financial monte carlo simulations: i.e. lots of dates, calculations, etc. As such, it's currently a bit complex and, I agree, not ideal for finding the issue. I'll have to try to scale it down.

like image 230
roded Avatar asked Nov 20 '14 10:11

roded


1 Answers

It looks like the problem is due to work done in JDK-7133857, in which java.lang.Math.pow() and java.lang.Math.exp() were intrinsified and calculated using x87.

These methods are used extensively (!) in the profiled application and hence their considerable effect.

JDK-8029302 describes and fixes the issue for power of 2 inputs; and the testing of the application with jdk1.8.0_25 (in which the issue was fixed) shows improved performance, though not back to the higher level of jdk1.7.0_25 before intrinsification was done.

Here are my JMH benchmarks and their results for Math.pow() on the three relevant JDK versions:

package org.sample;

import org.openjdk.jmh.annotations.*;
import java.lang.*;

public class MyBenchmark {

    @State(Scope.Benchmark)
    public static class ThreadState {
        volatile double x = 0;
        volatile double y = 0;
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public double powx(ThreadState state) {
        state.x++;
        state.y += 0.5;
        return Math.pow(state.x, state.y);
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public double pow3(ThreadState state) {
        state.x++;
        return Math.pow(state.x, 3);
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public double pow2(ThreadState state) {
        state.x++;
        return Math.pow(state.x, 2);
    }
}

The results:

Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz

jdk1.7.0_25 - before intrinsification

# VM invoker: x:\@sdks\jdks\jdk1.7.0_25\jre\bin\java.exe
...
Result: 4877658.355 (99.9%) 330460.323 ops/s [Average]
  Statistics: (min, avg, max) = (1216417.493, 4877658.355, 6421780.276), stdev = 1399189.700
  Confidence interval (99.9%): [4547198.032, 5208118.678]


# Run complete. Total time: 00:24:48

Benchmark                Mode  Samples         Score  Score error  Units
o.s.MyBenchmark.pow2    thrpt      200  40160618.138  1561135.596  ops/s
o.s.MyBenchmark.pow3    thrpt      200   3678800.153    88678.269  ops/s
o.s.MyBenchmark.powx    thrpt      200   4877658.355   330460.323  ops/s

jdk1.7.0_40 - intrinsification

# VM invoker: x:\@sdks\jdks\jdk1.7.0_40\jre\bin\java.exe
...
Result: 1860849.245 (99.9%) 94303.387 ops/s [Average]
  Statistics: (min, avg, max) = (418909.582, 1860849.245, 2379936.035), stdev = 399286.444
  Confidence interval (99.9%): [1766545.859, 1955152.632]


# Run complete. Total time: 00:24:48

Benchmark                Mode  Samples        Score  Score error  Units
o.s.MyBenchmark.pow2    thrpt      200  9619333.987   230749.333  ops/s
o.s.MyBenchmark.pow3    thrpt      200  9240043.369   238456.949  ops/s
o.s.MyBenchmark.powx    thrpt      200  1860849.245    94303.387  ops/s

jdk1.8.0_25 - fixed intrinsification

# VM invoker: x:\@sdks\jdks\jdk1.8.0_25\jre\bin\java.exe
...
Result: 1898015.057 (99.9%) 92555.236 ops/s [Average]
  Statistics: (min, avg, max) = (649562.297, 1898015.057, 2359474.902), stdev = 391884.665
  Confidence interval (99.9%): [1805459.821, 1990570.293]


# Run complete. Total time: 00:24:37

Benchmark                Mode  Samples         Score  Score error  Units
o.s.MyBenchmark.pow2    thrpt      200  81840274.815  1979190.065  ops/s
o.s.MyBenchmark.pow3    thrpt      200   9441518.686   206612.404  ops/s
o.s.MyBenchmark.powx    thrpt      200   1898015.057    92555.236  ops/s

If I'm reading this right, the power of 2 issue was certainly fixed in JDK-8029302 and power of >2 ints (I just tested Math.pow(x, 3)) performance was improved in jdk1.7.0_40. As for the weird non-int Math.pow()s as done above in the powx() benchmark, there seems to still be a considerable performance regression (>3x) when moving from jdk1.7.0_25 to jdk1.7.0_40.

Replacing Math.pow() and Math.exp() with their respective methods in org.apache.commons.math3.util.FastMath completely solves the problem with an increase in performance - this is the correct solution as far as I'm concerned.

Note: This would've been somewhat simpler if there was an easy way (i.e. without the requirement of building the JDK) to set the -XX:-InlineIntrinsics flag.

like image 192
roded Avatar answered Nov 17 '22 10:11

roded