<p>I've got two methods to read in a string, and create Character objects:</p> <pre class="prettyprint"><code>static void newChar(String string) { int len = string.length(); System.out.println("Reading " + len + " characters"); for (int i = 0; i < len; i++) { Character cur = new Character(string.charAt(i)); } } </code></pre> <p>and</p> <pre class="prettyprint"><code>static void justChar(String string) { int len = string.length(); for (int i = 0; i < len; i++) { Character cur = string.charAt(i); } } </code></pre> <p>When I run the methods using an 18,554,760 character string, I'm getting wildly different run times. The output I'm getting is:</p> <pre class="prettyprint"><code>newChar took: 20 ms justChar took: 41 ms </code></pre> <p>With smaller input (4,638,690 characters) the time isn't as varied. </p> <pre class="prettyprint"><code>newChar took: 12 ms justChar took: 13 ms </code></pre> <p>Why is new so much more efficient in this case?</p> <p>EDIT:</p> <p>My benchmark code is pretty hacky.</p> <pre class="prettyprint"><code>start = System.currentTimeMillis(); newChar(largeString); end = System.currentTimeMillis(); diff = end-start; System.out.println("New char took: " + diff + " ms"); start = System.currentTimeMillis(); justChar(largeString); end = System.currentTimeMillis(); diff = end-start; System.out.println("just char took: " + diff+ " ms"); </code></pre>

<h3>TL;DR section</h3> <h3>Good news</h3> <p>Your measurement does expose a real effect.</p> <h3>Bad news</h3> <p>It does so mostly by chance because your benchmark has many technical flaws, and the effect it exposes is probably not the one you have in mind.</p> <p>The <code>new Character()</code> approach is faster <em>if and only if</em> HotSpot's Escape Analysis succeeds in proving that the resulting instance can be safely allocated on the stack instead of heap. Therefore the effect is not nearly as general as implied in your question.</p> <h3>Explanation of effect</h3> <p>The reason why <code>new Character()</code> is faster is <em>locality of reference</em>: your instance is on the stack and all access to it is via CPU cache hits. When you reuse a cached instance, you must </p> <ol> <li>access a remote <code>static</code> field; </li> <li>dereference it into a remote array; </li> <li>dereference an array entry into a remote <code>Character</code> instance;</li> <li>acces the <code>char</code> contained in that instance.</li> </ol> <p>Each dereference is a potential CPU cache miss. Furthermore, it forces a part of the cache to be redirected towards those remote locations, causing more cache misses on the input string and/or the stack locations.</p> <h3>DETAILS</h3> <p>I have run this code with <code>jmh</code>:</p> <pre class="prettyprint"><code>@OutputTimeUnit(TimeUnit.MICROSECONDS) @BenchmarkMode(Mode.AverageTime) public class Chars { static String string = "12345678901234567890"; static { for (int i = 0; i < 10; i++) string += string; } @GenerateMicroBenchmark public void newChar() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)); } @GenerateMicroBenchmark public void justChar() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)); } } </code></pre> <p>This keeps the essence of your code, but eliminates some systematic errors like warmup and compilation times. These are the results:</p> <pre class="prettyprint"><code>Benchmark Mode Thr Cnt Sec Mean Mean error Units o.s.Chars.justChar avgt 1 3 5 39.062 6.587 usec/op o.s.Chars.newChar avgt 1 3 5 19.114 0.653 usec/op </code></pre> <p>And this would be my best guess at what's going on:</p> <ul> <li><p>in <code>newChar</code> you are creating a <em>fresh</em> instance of <code>Character</code>. HotSpot's Escape Analysis can prove the instance never escapes, therefore it allows stack allocation, or, in the special case of <code>Character</code>, could eliminate the allocation altogether because the data from it is provably never used;</p></li> <li><p>in <code>justChar</code> you involve lookup into the <code>Character</code> cache array, which has <em>some</em> cost.</p></li> </ul> <h3>UPDATE</h3> <p>In response to Aleks's criticism, I added some more methods to the benchmark. The main effect remains stable, but we get even more fine-grained details about the lesser optimization effects.</p> <pre class="prettyprint"><code> @GenerateMicroBenchmark public int newCharUsed() { int len = string.length(), sum = 0; for (int i = 0; i < len; i++) sum += new Character(string.charAt(i)); return sum; } @GenerateMicroBenchmark public int justCharUsed() { int len = string.length(), sum = 0; for (int i = 0; i < len; i++) sum += Character.valueOf(string.charAt(i)); return sum; } @GenerateMicroBenchmark public void newChar() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)); } @GenerateMicroBenchmark public void justChar() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)); } @GenerateMicroBenchmark public void newCharValue() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)).charValue(); } @GenerateMicroBenchmark public void justCharValue() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)).charValue(); } </code></pre> <h3>DESCRIPTION:</h3> <ul> <li>the base versions are <code>justChar</code> and <code>newChar</code>;</li> <li> <code>...Value</code> methods add the <code>charValue</code> call to the base version; </li> <li> <code>...Used</code> methods add both the <code>charValue</code> call (implicitly) and <em>use</em> the value to preclude any Dead Code Elimination.</li> </ul> <h3>RESULTS:</h3> <pre class="prettyprint"><code>Benchmark Mode Thr Cnt Sec Mean Mean error Units o.s.Chars.justChar avgt 1 3 1 246.847 5.969 usec/op o.s.Chars.justCharUsed avgt 1 3 1 370.031 26.057 usec/op o.s.Chars.justCharValue avgt 1 3 1 296.342 60.705 usec/op o.s.Chars.newChar avgt 1 3 1 123.302 10.596 usec/op o.s.Chars.newCharUsed avgt 1 3 1 172.721 9.055 usec/op o.s.Chars.newCharValue avgt 1 3 1 123.040 5.095 usec/op </code></pre> <ul> <li>there is evidence of <em>some</em> Dead Code Elimination (DCE) both in <code>justChar</code> and <code>newChar</code> variants, but it is only partial;</li> <li>with <code>newChar</code> variant, adding <code>charValue</code> has no effect so apparently it was DCE'd;</li> <li>with <code>justChar</code>, <code>charValue</code> does have an effect, so seems not to have been eliminated;</li> <li>DCE has a minor overall effect, as witnessed by the stable difference between <code>newCharUsed</code> and <code>justCharUsed</code>.</li> </ul>

Why is the "new" keyword so much more efficient than assignment?

Tags:

java

performance

new-operator

I've got two methods to read in a string, and create Character objects:

static void newChar(String string) {
    int len = string.length();
    System.out.println("Reading " + len + " characters");
    for (int i = 0; i < len; i++) {
        Character cur = new Character(string.charAt(i));

    }       
}

and

static void justChar(String string) {
    int len = string.length();
    for (int i = 0; i < len; i++) {
        Character cur = string.charAt(i);

    }
}

When I run the methods using an 18,554,760 character string, I'm getting wildly different run times. The output I'm getting is:

newChar took: 20 ms
justChar took: 41 ms

With smaller input (4,638,690 characters) the time isn't as varied.

newChar took: 12 ms
justChar took: 13 ms

Why is new so much more efficient in this case?

EDIT:

My benchmark code is pretty hacky.

start = System.currentTimeMillis();
newChar(largeString);
end = System.currentTimeMillis();
diff = end-start;
System.out.println("New char took: " + diff + " ms");

start = System.currentTimeMillis();
justChar(largeString);
end = System.currentTimeMillis();
diff = end-start;
System.out.println("just char took: " + diff+ " ms");

817

asked Dec 27 '13 20:12

Steven Spasbo

2 Answers

Well, I'm not sure if Marko was intentional in replicating the original mistake. TL;DR; new instance is not used, gets eliminated. Adjusting the benchmark reverses the result. Don't trust faulty benchmarks, learn from them.

Here's the JMH benchmark:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
@State(Scope.Thread)
public class Chars {

    // Source needs to be @State field to avoid constant optimizations
    // on sources. Results need to be sinked into the Blackhole to
    // avoid dead-code elimination
    private String string;

    @Setup
    public void setup() {
        string = "12345678901234567890";
        for (int i = 0; i < 10; i++) {
            string += string;
        }
    }

    @GenerateMicroBenchmark
    public void newChar_DCE(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = new Character(string.charAt(i));
        }
    }

    @GenerateMicroBenchmark
    public void justChar_DCE(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = Character.valueOf(string.charAt(i));
        }
    }

    @GenerateMicroBenchmark
    public void newChar(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = new Character(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void justChar(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = Character.valueOf(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void newChar_prim(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            char c = new Character(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void justChar_prim(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            char c = Character.valueOf(string.charAt(i));
            bh.consume(c);
        }
    }
}

...and this is the result:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Chars.justChar          avgt         9       93.051        0.365    us/op
o.s.Chars.justChar_DCE      avgt         9       62.018        0.092    us/op
o.s.Chars.justChar_prim     avgt         9       82.897        0.440    us/op
o.s.Chars.newChar           avgt         9      117.962        4.679    us/op
o.s.Chars.newChar_DCE       avgt         9       25.861        0.102    us/op
o.s.Chars.newChar_prim      avgt         9       41.334        0.183    us/op

DCE stands for "Dead Code Elimination", and that is what the original benchmark is suffering from. If we eliminate that effect, in JMH's way it requires us to sink the values into the Blackhole, the score reverses. So, in retrospect, that seems to indicate the new Character() in the original code has major improvement with DCE, while the Character.valueOf is not that successful. I'm not sure we should discuss why, because this has no bearing on the real world use cases, where produced Characters are actually used.

You can go further on two fronts from here:

Get the assembly for the benchmark methods to confirm the conjecture above. See PrintAssembly.
Run with more threads. The difference between returning cached Character and instantiating the new one would diminish as we increase the number of threads, and consequently hit the "allocation wall".

UPD: Following up on Marko's question, it does seem the major impact is about eliminating the allocation itself, whether via the EA or DCE, see *_prim tests.

UPD2: Looked into the assembly. The same run with -XX:-DoEscapeAnalysis confirms the major effect is due to eliminating the allocation, as the effect of escape analysis:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Chars.justChar          avgt         9       94.318        4.525    us/op
o.s.Chars.justChar_DCE      avgt         9       61.993        0.227    us/op
o.s.Chars.justChar_prim     avgt         9       82.824        0.634    us/op
o.s.Chars.newChar           avgt         9      118.862        1.096    us/op
o.s.Chars.newChar_DCE       avgt         9       97.530        2.485    us/op
o.s.Chars.newChar_prim      avgt         9      101.905        1.871    us/op

This proves the original DCE conjecture is incorrect. EA is the major contributor. DCE results are still faster because we do not pay the costs of unboxing, and generally treating the returned value with any respect. Benchmark is faulty in that regard nevertheless.

answered Oct 19 '22 19:10

Aleksey Shipilev

TL;DR section

Good news

Your measurement does expose a real effect.

Bad news

It does so mostly by chance because your benchmark has many technical flaws, and the effect it exposes is probably not the one you have in mind.

The new Character() approach is faster if and only if HotSpot's Escape Analysis succeeds in proving that the resulting instance can be safely allocated on the stack instead of heap. Therefore the effect is not nearly as general as implied in your question.

Explanation of effect

The reason why new Character() is faster is locality of reference: your instance is on the stack and all access to it is via CPU cache hits. When you reuse a cached instance, you must

access a remote static field;
dereference it into a remote array;
dereference an array entry into a remote Character instance;
acces the char contained in that instance.

Each dereference is a potential CPU cache miss. Furthermore, it forces a part of the cache to be redirected towards those remote locations, causing more cache misses on the input string and/or the stack locations.

DETAILS

I have run this code with jmh:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
public class Chars {
  static String string = "12345678901234567890"; static {
    for (int i = 0; i < 10; i++) string += string;
  }

  @GenerateMicroBenchmark
  public void newChar() {
    int len = string.length();
    for (int i = 0; i < len; i++) new Character(string.charAt(i));
  }

  @GenerateMicroBenchmark
  public void justChar() {
    int len = string.length();
    for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i));
  }
}

This keeps the essence of your code, but eliminates some systematic errors like warmup and compilation times. These are the results:

Benchmark              Mode Thr    Cnt  Sec         Mean   Mean error    Units
o.s.Chars.justChar     avgt   1      3    5       39.062        6.587  usec/op
o.s.Chars.newChar      avgt   1      3    5       19.114        0.653  usec/op

And this would be my best guess at what's going on:

in newChar you are creating a fresh instance of Character. HotSpot's Escape Analysis can prove the instance never escapes, therefore it allows stack allocation, or, in the special case of Character, could eliminate the allocation altogether because the data from it is provably never used;
in justChar you involve lookup into the Character cache array, which has some cost.

UPDATE

In response to Aleks's criticism, I added some more methods to the benchmark. The main effect remains stable, but we get even more fine-grained details about the lesser optimization effects.

  @GenerateMicroBenchmark
  public int newCharUsed() {
    int len = string.length(), sum = 0;
    for (int i = 0; i < len; i++) sum += new Character(string.charAt(i));
    return sum;
  }

  @GenerateMicroBenchmark
  public int justCharUsed() {
    int len = string.length(), sum = 0;
    for (int i = 0; i < len; i++) sum += Character.valueOf(string.charAt(i));
    return sum;
  }

  @GenerateMicroBenchmark
  public void newChar() {
    int len = string.length();
    for (int i = 0; i < len; i++) new Character(string.charAt(i));
  }

  @GenerateMicroBenchmark
  public void justChar() {
    int len = string.length();
    for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i));
  }

  @GenerateMicroBenchmark
  public void newCharValue() {
    int len = string.length();
    for (int i = 0; i < len; i++) new Character(string.charAt(i)).charValue();
  }

  @GenerateMicroBenchmark
  public void justCharValue() {
    int len = string.length();
    for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)).charValue();
  }

DESCRIPTION:

the base versions are justChar and newChar;
...Value methods add the charValue call to the base version;
...Used methods add both the charValue call (implicitly) and use the value to preclude any Dead Code Elimination.

RESULTS:

Benchmark                   Mode Thr    Cnt  Sec         Mean   Mean error    Units
o.s.Chars.justChar          avgt   1      3    1      246.847        5.969  usec/op
o.s.Chars.justCharUsed      avgt   1      3    1      370.031       26.057  usec/op
o.s.Chars.justCharValue     avgt   1      3    1      296.342       60.705  usec/op
o.s.Chars.newChar           avgt   1      3    1      123.302       10.596  usec/op
o.s.Chars.newCharUsed       avgt   1      3    1      172.721        9.055  usec/op
o.s.Chars.newCharValue      avgt   1      3    1      123.040        5.095  usec/op

there is evidence of some Dead Code Elimination (DCE) both in justChar and newChar variants, but it is only partial;
with newChar variant, adding charValue has no effect so apparently it was DCE'd;
with justChar, charValue does have an effect, so seems not to have been eliminated;
DCE has a minor overall effect, as witnessed by the stable difference between newCharUsed and justCharUsed.

answered Oct 19 '22 20:10

Marko Topolnik

Related questions
                            
                                Java Regex Illegal Escape Character in Character Class
                            
                                Disabling Eclipse auto-complete but keeping auto-activation
                            
                                Difference between using @OneToMany and @ManyToMany
                            
                                why there is no sizeof in java
                            
                                Implementing exponential moving average in Java
                            
                                Invoking a Java Method in JSP
                            
                                Transparent JPanel
                            
                                Adding values to Arraylist
                            
                                Update data in Arrayadapter
                            
                                Can Jaxb marshal child elements without the root element?
                            
                                How to output java -jar errors to a text/log file?
                            
                                Why is it not allowed to throw an exception in a Java instance initialization block?
                            
                                jpql date comparison today
                            
                                set the table cell width in iText java pdf
                            
                                How to specify base dir then we run ant like ant -f somedir/dir/build.xml
                            
                                How to use the Comparator interface
                            
                                this.getClass().getFields().length; always returns 0 [duplicate]
                            
                                Double value with specific precision in java
                            
                                Selenium WebDriver can't find element by link text
                            
                                How to create immutable class in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With