Let's say we have some code such as the following: <pre class="prettyprint"><code>public static void main(String[] args) { String s = ""; for(int i=0 ; i<10000 ; i++) { s += "really "; } s += "long string."; } </code></pre> (Yes, I know a far better implementation would use a <code>StringBuilder</code>, but bear with me.) Trivially, we might expect the bytecode produced to be something akin to the following: <pre class="prettyprint"><code>public static void main(java.lang.String[]); Code: 0: ldc #2 // String 2: astore_1 3: iconst_0 4: istore_2 5: iload_2 6: sipush 10000 9: if_icmpge 25 12: aload_1 13: ldc #3 // String really 15: invokevirtual #4 // Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String; 18: astore_1 19: iinc 2, 1 22: goto 5 25: aload_1 26: ldc #5 // String long string. 28: invokevirtual #4 // Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String; 31: astore_1 32: return </code></pre> However, instead the compiler tries to be a bit smarter - rather than using the concat method, it has a baked in optimisation to use <code>StringBuilder</code> objects instead, so we get the following: <pre class="prettyprint"><code>public static void main(java.lang.String[]); Code: 0: ldc #2 // String 2: astore_1 3: iconst_0 4: istore_2 5: iload_2 6: sipush 10000 9: if_icmpge 38 12: new #3 // class java/lang/StringBuilder 15: dup 16: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V 19: aload_1 20: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 23: ldc #6 // String really 25: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 28: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 31: astore_1 32: iinc 2, 1 35: goto 5 38: new #3 // class java/lang/StringBuilder 41: dup 42: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V 45: aload_1 46: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 49: ldc #8 // String long string. 51: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 54: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 57: astore_1 58: return </code></pre> However, this seems rather counter-productive to me - instead of using one string builder for the entire loop, one is created for each single concatenation operation, making it equivalent to the following: <pre class="prettyprint"><code>public static void main(String[] args) { String s = ""; for(int i=0 ; i<10000 ; i++) { s = new StringBuilder().append(s).append("really ").toString(); } s = new StringBuilder().append(s).append("long string.").toString(); } </code></pre> So now instead of the original trivial bad approach of just creating lots of string objects and throwing them away, the compiler has produced an far worse approach of creating lots of <code>String</code> objects, lots of <code>StringBuilder</code> objects, calling more methods, and still throwing them all away to generate the same output as without this optimisation. So the question has to be - why? I understand that in cases like this: <pre class="prettyprint"><code>String s = getString1() + getString2() + getString3(); </code></pre> ...the compiler will create just one <code>StringBuilder</code> object for all three strings, so there are cases where the optimisation is useful. However, examing the bytecode reveals that even separating the above case to the following: <pre class="prettyprint"><code>String s = getString1(); s += getString2(); s += getString3(); </code></pre> ...means that we're back with the case that three <code>StringBuilder</code> objects are individually created. I'd understand if these were odd corner cases, but appending to strings in this way (and in a loop) are really rather common operations. Surely it would be trivial to determine, at compile time, if a compiler-generated <code>StringBuilder</code> only ever appended one value - and if this was the case, use a simple concat operation instead? This is all with 8u5 (however, it goes back to at least Java 5, probably before.) FWIW, my benchmarks (unsurprisingly) put the manual <code>concat()</code> approach 2x3 times faster than using <code>+=</code> in a loop with 10,000 elements. Of course, using a manual <code>StringBuilder</code> is always the preferable approach, but surely the compiler shouldn't adversely affect the performance of the <code>+=</code> approach either?

<blockquote> So the question has to be - why? </blockquote> It is not clear why they don't optimize this a bit better in the bytecode compiler. You would need to ask the Oracle Java compiler team. One possible explanation is that there may be code in the HotSpot JIT compiler to optimize the bytecode sequence into something better. (If you were curious, you could modify the code so that it got JIT compiled ... and then capture and examine the native code. However, you might actually find that the JIT compiler optimizes away the method body entirely ...) Another possible explanation is that the original Java code is so pessimal to start with that they figured that optimizing it would not have a significant effect. Consider that a seasoned Java programmer would write it as: <pre class="prettyprint"><code>public static void main(String[] args) { StringBuilder sb = new StringBuilder(); for (int i=0 ; i<10000 ; i++) { sb.append("really "); } sb.append("long string."); String s = sb.toString(); } </code></pre> That is going to run roughly 4 orders of magnitude faster. <hr> UPDATE - I used the code link from the linked Q&A to find the actual place in Java bytecode compiler source that generates that code: here. There are no hints in the source to explain the "dumb"-ness of the code generation strategy. <hr> So to your general Question: <blockquote> Does Javac's StringBuilder optimisation do more harm than good? </blockquote> No. My understanding is that the compiler developers did extensive benchmarking to determine that (overall) the StringBuilder optimizations are worthwhile. You have found an edge case in a badly written program that could be optimized better (it is hypothesized). This is not sufficient to conclude the optimization "does more harm than good" overall.

Does Javac's StringBuilder optimisation do more harm than good?

Tags:

java

string

optimization

stringbuilder

javac

Let's say we have some code such as the following:

public static void main(String[] args) {
    String s = "";
    for(int i=0 ; i<10000 ; i++) {
        s += "really ";
    }
    s += "long string.";
}

(Yes, I know a far better implementation would use a StringBuilder, but bear with me.)

Trivially, we might expect the bytecode produced to be something akin to the following:

public static void main(java.lang.String[]);
Code:
   0: ldc           #2                  // String 
   2: astore_1      
   3: iconst_0      
   4: istore_2      
   5: iload_2       
   6: sipush        10000
   9: if_icmpge     25
  12: aload_1       
  13: ldc           #3                  // String really 
  15: invokevirtual #4                  // Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  18: astore_1      
  19: iinc          2, 1
  22: goto          5
  25: aload_1       
  26: ldc           #5                  // String long string.
  28: invokevirtual #4                  // Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  31: astore_1      
  32: return

However, instead the compiler tries to be a bit smarter - rather than using the concat method, it has a baked in optimisation to use StringBuilder objects instead, so we get the following:

public static void main(java.lang.String[]);
Code:
   0: ldc           #2                  // String 
   2: astore_1      
   3: iconst_0      
   4: istore_2      
   5: iload_2       
   6: sipush        10000
   9: if_icmpge     38
  12: new           #3                  // class java/lang/StringBuilder
  15: dup           
  16: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
  19: aload_1       
  20: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  23: ldc           #6                  // String really 
  25: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  28: invokevirtual #7                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  31: astore_1      
  32: iinc          2, 1
  35: goto          5
  38: new           #3                  // class java/lang/StringBuilder
  41: dup           
  42: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
  45: aload_1       
  46: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  49: ldc           #8                  // String long string.
  51: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  54: invokevirtual #7                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  57: astore_1      
  58: return

However, this seems rather counter-productive to me - instead of using one string builder for the entire loop, one is created for each single concatenation operation, making it equivalent to the following:

public static void main(String[] args) {
    String s = "";
    for(int i=0 ; i<10000 ; i++) {
        s = new StringBuilder().append(s).append("really ").toString();
    }
    s = new StringBuilder().append(s).append("long string.").toString();
}

So now instead of the original trivial bad approach of just creating lots of string objects and throwing them away, the compiler has produced an far worse approach of creating lots of String objects, lots of StringBuilder objects, calling more methods, and still throwing them all away to generate the same output as without this optimisation.

So the question has to be - why? I understand that in cases like this:

String s = getString1() + getString2() + getString3();

...the compiler will create just one StringBuilder object for all three strings, so there are cases where the optimisation is useful. However, examing the bytecode reveals that even separating the above case to the following:

String s = getString1();
s += getString2();
s += getString3();

...means that we're back with the case that three StringBuilder objects are individually created. I'd understand if these were odd corner cases, but appending to strings in this way (and in a loop) are really rather common operations.

Surely it would be trivial to determine, at compile time, if a compiler-generated StringBuilder only ever appended one value - and if this was the case, use a simple concat operation instead?

This is all with 8u5 (however, it goes back to at least Java 5, probably before.) FWIW, my benchmarks (unsurprisingly) put the manual concat() approach 2x3 times faster than using += in a loop with 10,000 elements. Of course, using a manual StringBuilder is always the preferable approach, but surely the compiler shouldn't adversely affect the performance of the += approach either?

962

asked Jul 07 '14 15:07

Michael Berry

1 Answers

So the question has to be - why?

It is not clear why they don't optimize this a bit better in the bytecode compiler. You would need to ask the Oracle Java compiler team.

One possible explanation is that there may be code in the HotSpot JIT compiler to optimize the bytecode sequence into something better. (If you were curious, you could modify the code so that it got JIT compiled ... and then capture and examine the native code. However, you might actually find that the JIT compiler optimizes away the method body entirely ...)

Another possible explanation is that the original Java code is so pessimal to start with that they figured that optimizing it would not have a significant effect. Consider that a seasoned Java programmer would write it as:

public static void main(String[] args) {
    StringBuilder sb = new StringBuilder();
    for (int i=0 ; i<10000 ; i++) {
        sb.append("really ");
    }
    sb.append("long string.");
    String s = sb.toString();
}

That is going to run roughly 4 orders of magnitude faster.

UPDATE - I used the code link from the linked Q&A to find the actual place in Java bytecode compiler source that generates that code: here.

There are no hints in the source to explain the "dumb"-ness of the code generation strategy.

So to your general Question:

Does Javac's StringBuilder optimisation do more harm than good?

No.

My understanding is that the compiler developers did extensive benchmarking to determine that (overall) the StringBuilder optimizations are worthwhile.

You have found an edge case in a badly written program that could be optimized better (it is hypothesized). This is not sufficient to conclude the optimization "does more harm than good" overall.

140

answered Nov 06 '22 06:11

Stephen C

Related questions
                            
                                Can I find the URL for a spring mvc controller in the view layer?
                            
                                Sizing and Capacity Planning Tips and How-to
                            
                                Programmatically find complement of colors?
                            
                                MultipleOutputFormat in hadoop
                            
                                Admob - no ad to show
                            
                                Tomcat and Eclipse Zero Turnaround Deployment
                            
                                Is there something similar to mini-mvc-profiler for Java?
                            
                                Android: HTTPS (SSL) connection using HttpsURLConnection
                            
                                What techniques do you use to debug complex guice bindings?
                            
                                When exactly is the JVM throwing an OutOfMemoryError
                            
                                Broken wildcard expansion for Java7 commandline on Windows(7?)
                            
                                MySQLNonTransientConnectionException: No operations allowed after connection closed.Connection
                            
                                Is there a Java library that adds annotations for Logging?
                            
                                Override delete key on Android?
                            
                                Android Studio - how to use libraries (from eclipse projects)
                            
                                What is the reason for BitSet's size() method?
                            
                                "Inline assembly" for Java byte codes
                            
                                Access resources from another jar file
                            
                                Is it possible to run multithreaded application on a single core of multicore computer? [duplicate]
                            
                                How to give warning message in own API?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does Javac's StringBuilder optimisation do more harm than good?

Tags:

java

string

optimization

stringbuilder

javac

Michael Berry

People also ask

1 Answers

Stephen C

Recent Activity

Donate For Us