Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 11 string concatenation performance Vs java 8

Does anyone know why I get such different performance when running this code on Java 8 and Java 11?

Without using any runtime flags, it appears that this code runs significantly slower under Java 11 compared to Java 8.

import java.util.Date;

public class PerformanceExperiment {
    public static volatile String s = "";

    public static void main(String[] args)
    {                         
        System.out.println("Starting performance test");
        String s1 = "STRING ONE";
        String s2 = "STRING TWO";
        long now1 = (new Date()).getTime();
        for (long i = 0; i < 1_000_000_00; i++)
        {
            s = "abc " + s1 + " def " + s2;
        }
        long now2 = (new Date()).getTime();
        System.out.println("initial block took " + (now2 - now1) + "ms");
        for (long i = 0; i < 4_000_000_00; i++)
        {
            s = "abc " + s1 + " def " + s2;
        }
        long now3 = (new Date()).getTime();
        System.out.println("Main block took " + (now3 - now2) + "ms");
    }
}

I have tried numerous command line flags but have not managed to find anything that matches the Java 8 performance.

I have only tested this on Windows so it could behave differently on other operating systems.

like image 232
Alex Avatar asked Mar 28 '19 11:03

Alex


2 Answers

I modified your app to

  1. Use System.nanoTime() instead of new Date() for greater precision (see this answer for more info: https://stackoverflow.com/a/1776053/963076).
  2. Use Netbeans profiler.
  3. Loop through 10 iterations

Using Netbeans 8.2 with JDK 8 v181:

Starting performance test 0
initial block took 3147ms
Main block took 9469ms
Starting performance test 1
initial block took 2398ms
Main block took 9601ms
Starting performance test 2
initial block took 2463ms
Main block took 9671ms
Starting performance test 3
initial block took 2464ms
Main block took 9565ms
Starting performance test 4
initial block took 2410ms
Main block took 9672ms
Starting performance test 5
initial block took 2418ms
Main block took 9598ms
Starting performance test 6
initial block took 2384ms
Main block took 9733ms
Starting performance test 7
initial block took 2402ms
Main block took 9610ms
Starting performance test 8
initial block took 2509ms
Main block took 11222ms
Starting performance test 9
initial block took 2455ms
Main block took 10661ms

And the profiler showed this telemetry:

enter image description here

With Netbeans 10.0 using JDK 11.0.2:

Starting performance test 0
initial block took 3760ms
Main block took 15056ms
Starting performance test 1
initial block took 3734ms
Main block took 14602ms
Starting performance test 2
initial block took 3615ms
Main block took 14762ms
Starting performance test 3
initial block took 3748ms
Main block took 14534ms
Starting performance test 4
initial block took 3628ms
Main block took 14759ms
Starting performance test 5
initial block took 3625ms
Main block took 14959ms
Starting performance test 6
initial block took 3987ms
Main block took 14967ms
Starting performance test 7
initial block took 3803ms
Main block took 14701ms
Starting performance test 8
initial block took 3599ms
Main block took 14762ms
Starting performance test 9
initial block took 3627ms
Main block took 14434ms

enter image description here

My conclusion: JDK 11 is doing much more work to be memory efficient. Notice the # of "surviving generations" in the garbage collector is way (way) less with JDK11 and the memory usage is also significantly less in terms of usage and volatility. The trade-off seems to be in speed, but the difference in speed is less than the difference in memory usage.

like image 197
ryvantage Avatar answered Nov 15 '22 00:11

ryvantage


TL;DR: Need better benchmarks, better setup with controlling the differences between releases, etc. Most of the benchmarking issues would be trivially solved by using JMH. Current test behavior seems to be explained by questionable benchmarking approach and the change in default GC.

Consider this:

public class PerformanceExperiment {
    public static volatile String s = "";

    public static void main(String[] args) {
        for (int c = 0; c < 5; c++) {
            test();
        }
    }

    public static void test() {
        String s1 = "STRING ONE";
        String s2 = "STRING TWO";
        long time1 = System.currentTimeMillis();
        for (long i = 0; i < 4_000_000_00; i++) {
            s = "abc " + s1 + " def " + s2;
        }
        long time2 = System.currentTimeMillis();
        System.out.println("Main block took " + (time2 - time1) + "ms");
    }
}

First, it uses more convenient timings. Then, it measures the same bytecode block, while original tests warms up "initial one" and then proceeds to measure the absolutely cold one.

Then, JIT compilation would hit the method and you want to re-enter the method again to get the optimized code running, otherwise you are running the intermediate "on-stack-replacement" code -- you can do it with external iteration that calls into test. And on top of that, you want to enter multiple times to capture the most optimized version.

And, since the test allocates a lot, you want to nail the heap size.

So, here:

$ ~/Install/jdk8u191-rh/bin/javac PerformanceExperiment.java
$ ~/Install/jdk8u191-rh/bin/java -Xms2g -Xmx2g PerformanceExperiment
Main block took 10024ms
Main block took 9768ms
Main block took 7249ms
Main block took 7235ms
Main block took 7205ms

...and here is 11.0.2 on the same bytecode:

$ ~/Install/jdk11.0.2/bin/java -Xms2g -Xmx2g PerformanceExperiment
Main block took 9775ms
Main block took 10825ms
Main block took 8635ms
Main block took 8616ms
Main block took 8622ms

...and here is 11.0.2 with matching GC (9+ changed the default to G1 with JEP 248):

$ ~/Install/jdk11.0.2/bin/java -Xms2g -Xmx2g -XX:+UseParallelGC PerformanceExperiment
Main block took 9281ms
Main block took 9129ms
Main block took 6725ms
Main block took 6688ms
Main block took 6684ms

On top of that, there is volatile store on every tiny iteration, that costs quite a lot and probably distorts the benchmark.

There is also interaction with indified string concat (JEP 280), thread-local handshakes (JEP 312), and other VM fixes, but you would probably see that only when compiling past target=8, which goes beyond the scope of this exercise.

like image 31
Aleksey Shipilev Avatar answered Nov 15 '22 01:11

Aleksey Shipilev