Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8: Class.getName() slows down String concatenation chain

Recently I've run into an issue regarding String concatenation. This benchmark summarizes it:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class BrokenConcatenationBenchmark {

  @Benchmark
  public String slow(Data data) {
    final Class<? extends Data> clazz = data.clazz;
    return "class " + clazz.getName();
  }

  @Benchmark
  public String fast(Data data) {
    final Class<? extends Data> clazz = data.clazz;
    final String clazzName = clazz.getName();
    return "class " + clazzName;
  }

  @State(Scope.Thread)
  public static class Data {
    final Class<? extends Data> clazz = getClass();

    @Setup
    public void setup() {
      //explicitly load name via native method Class.getName0()
      clazz.getName();
    }
  }
}

On JDK 1.8.0_222 (OpenJDK 64-Bit Server VM, 25.222-b10) I've got the following results:

Benchmark                                                            Mode  Cnt     Score     Error   Units
BrokenConcatenationBenchmark.fast                                    avgt   25    22,253 ±   0,962   ns/op
BrokenConcatenationBenchmark.fast:·gc.alloc.rate                     avgt   25  9824,603 ± 400,088  MB/sec
BrokenConcatenationBenchmark.fast:·gc.alloc.rate.norm                avgt   25   240,000 ±   0,001    B/op
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Eden_Space            avgt   25  9824,162 ± 397,745  MB/sec
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Eden_Space.norm       avgt   25   239,994 ±   0,522    B/op
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Survivor_Space        avgt   25     0,040 ±   0,011  MB/sec
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Survivor_Space.norm   avgt   25     0,001 ±   0,001    B/op
BrokenConcatenationBenchmark.fast:·gc.count                          avgt   25  3798,000            counts
BrokenConcatenationBenchmark.fast:·gc.time                           avgt   25  2241,000                ms

BrokenConcatenationBenchmark.slow                                    avgt   25    54,316 ±   1,340   ns/op
BrokenConcatenationBenchmark.slow:·gc.alloc.rate                     avgt   25  8435,703 ± 198,587  MB/sec
BrokenConcatenationBenchmark.slow:·gc.alloc.rate.norm                avgt   25   504,000 ±   0,001    B/op
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Eden_Space            avgt   25  8434,983 ± 198,966  MB/sec
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Eden_Space.norm       avgt   25   503,958 ±   1,000    B/op
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Survivor_Space        avgt   25     0,127 ±   0,011  MB/sec
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Survivor_Space.norm   avgt   25     0,008 ±   0,001    B/op
BrokenConcatenationBenchmark.slow:·gc.count                          avgt   25  3789,000            counts
BrokenConcatenationBenchmark.slow:·gc.time                           avgt   25  2245,000                ms

This looks like an issue similar to JDK-8043677, where an expression having side effect breaks optimization of new StringBuilder.append().append().toString() chain. But the code of Class.getName() itself does not seem to have any side effects:

private transient String name;

public String getName() {
  String name = this.name;
  if (name == null) {
    this.name = name = this.getName0();
  }

  return name;
}

private native String getName0();

The only suspicious thing here is a call to native method which happens in fact only once and its result is cached in the field of the class. In my benchmark I've explicitly cached it in setup method.

I've expected branch predictor to figure out that at each benchmark invocation the actual value of this.name is never null and optimize the whole expression.

However, while for the BrokenConcatenationBenchmark.fast() I have this:

@ 19   tsypanov.strings.benchmark.concatenation.BrokenConcatenationBenchmark::fast (30 bytes)   force inline by CompileCommand
  @ 6   java.lang.Class::getName (18 bytes)   inline (hot)
    @ 14   java.lang.Class::initClassName (0 bytes)   native method
  @ 14   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
  @ 19   java.lang.StringBuilder::append (8 bytes)   inline (hot)
  @ 23   java.lang.StringBuilder::append (8 bytes)   inline (hot)
  @ 26   java.lang.StringBuilder::toString (35 bytes)   inline (hot)

i.e. compiler is able to inline everything, for BrokenConcatenationBenchmark.slow() it is different:

@ 19   tsypanov.strings.benchmark.concatenation.BrokenConcatenationBenchmark::slow (28 bytes)   force inline by CompilerOracle
  @ 9   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
    @ 3   java.lang.AbstractStringBuilder::<init> (12 bytes)   inline (hot)
      @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
  @ 14   java.lang.StringBuilder::append (8 bytes)   inline (hot)
    @ 2   java.lang.AbstractStringBuilder::append (50 bytes)   inline (hot)
      @ 10   java.lang.String::length (6 bytes)   inline (hot)
      @ 21   java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)   inline (hot)
        @ 17   java.lang.AbstractStringBuilder::newCapacity (39 bytes)   inline (hot)
        @ 20   java.util.Arrays::copyOf (19 bytes)   inline (hot)
          @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
          @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)
      @ 35   java.lang.String::getChars (62 bytes)   inline (hot)
        @ 58   java.lang.System::arraycopy (0 bytes)   (intrinsic)
  @ 18   java.lang.Class::getName (21 bytes)   inline (hot)
    @ 11   java.lang.Class::getName0 (0 bytes)   native method
  @ 21   java.lang.StringBuilder::append (8 bytes)   inline (hot)
    @ 2   java.lang.AbstractStringBuilder::append (50 bytes)   inline (hot)
      @ 10   java.lang.String::length (6 bytes)   inline (hot)
      @ 21   java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)   inline (hot)
        @ 17   java.lang.AbstractStringBuilder::newCapacity (39 bytes)   inline (hot)
        @ 20   java.util.Arrays::copyOf (19 bytes)   inline (hot)
          @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
          @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)
      @ 35   java.lang.String::getChars (62 bytes)   inline (hot)
        @ 58   java.lang.System::arraycopy (0 bytes)   (intrinsic)
  @ 24   java.lang.StringBuilder::toString (17 bytes)   inline (hot)

So the question is whether this is appropriate behaviour of the JVM or compiler bug?

I'm asking the question because some of the projects are still using Java 8 and if it won't be fixed on any of release updates then to me it's reasonable to hoist calls to Class.getName() manually from hot spots.

P.S. On the latest JDKs (11, 13, 14-eap) the issue is not reproduced.

like image 322
Sergey Tsypanov Avatar asked Dec 03 '19 12:12

Sergey Tsypanov


People also ask

Is string concatenation slow in Java?

Let me say the reason that string concatenation is slow is because strings are immutable. This means every time you write "+=", a new String is created. This means the way you build up your string is in the worst case, O(n2).

Why is string concatenation slow?

The reason is that String is immutable; its value does not change. When adding a string, we create a new string in memory. StringBuilder is mutable, so when we use the append, its value changes, not a new string is created. Therefore using StringBuilder will save memory and run faster.

Which is faster string concatenation or the StringBuilder class?

Note that regular string concatenations are faster than using the StringBuilder but only when you're using a few of them at a time. If you are using two or three string concatenations, use a string.


2 Answers

HotSpot JVM collects execution statistics per bytecode. If the same code is run in different contexts, the result profile will aggregate statistics from all contexts. This effect is known as profile pollution.

Class.getName() is obviously called not only from your benchmark code. Before JIT starts compiling the benchmark, it already knows that the following condition in Class.getName() was met multiple times:

    if (name == null)
        this.name = name = getName0();

At least, enough times to treat this branch statistically important. So, JIT did not exclude this branch from compilation, and thus could not optimize string concat due to possible side effect.

This does not even need to be a native method call. Just a regular field assignment is also considered a side effect.

Here is an example how profile pollution can harm further optimizations.

@State(Scope.Benchmark)
public class StringConcat {
    private final MyClass clazz = new MyClass();

    static class MyClass {
        private String name;

        public String getName() {
            if (name == null) name = "ZZZ";
            return name;
        }
    }

    @Param({"1", "100", "400", "1000"})
    private int pollutionCalls;

    @Setup
    public void setup() {
        for (int i = 0; i < pollutionCalls; i++) {
            new MyClass().getName();
        }
    }

    @Benchmark
    public String fast() {
        String clazzName = clazz.getName();
        return "str " + clazzName;
    }

    @Benchmark
    public String slow() {
        return "str " + clazz.getName();
    }
}

This is basically the modified version of your benchmark that simulates the pollution of getName() profile. Depending on the number of preliminary getName() calls on a fresh object, the further performance of string concatenation may dramatically differ:

Benchmark          (pollutionCalls)  Mode  Cnt   Score   Error  Units
StringConcat.fast                 1  avgt   15  11,458 ± 0,076  ns/op
StringConcat.fast               100  avgt   15  11,690 ± 0,222  ns/op
StringConcat.fast               400  avgt   15  12,131 ± 0,105  ns/op
StringConcat.fast              1000  avgt   15  12,194 ± 0,069  ns/op
StringConcat.slow                 1  avgt   15  11,771 ± 0,105  ns/op
StringConcat.slow               100  avgt   15  11,963 ± 0,212  ns/op
StringConcat.slow               400  avgt   15  26,104 ± 0,202  ns/op  << !
StringConcat.slow              1000  avgt   15  26,108 ± 0,436  ns/op  << !

More examples of profile pollution »

I can't call it either a bug or an "appropriate behaviour". This is just how dynamic adaptive compilation is implemented in HotSpot.

like image 125
apangin Avatar answered Sep 28 '22 12:09

apangin


Slightly unrelated but since Java 9 and JEP 280: Indify String Concatenation the string concatenation is now done with invokedynamic and not StringBuilder. This article shows the differences in the bytecode between Java 8 and Java 9.

If the benchmark re-run on newer Java version doesn't show the problem there is most likley no bug in javac because the compiler now uses new mechanism. Not sure if diving into Java 8 behavior is beneficial if there is such a substantial change in the newer versions.

like image 35
Karol Dowbecki Avatar answered Sep 28 '22 12:09

Karol Dowbecki