Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is String.Concat not optimized to StringBuilder.Append?

Tags:

I found concatenations of constant string expressions are optimized by the compiler into one string.

Now with string concatenation of strings only known at run-time, why does the compiler not optimize string concatenation in loops and concatenations of say more than 10 strings to use StringBuilder.Append instead? I mean, it's possible, right? Instantiate a StringBuilder and take each concatenation and turn it into an Append() call.

Is there any reason why this should or could not be optimized? What am I missing?

like image 724
Wim Avatar asked Feb 01 '10 14:02

Wim


People also ask

Why StringBuilder is faster than string concatenation?

String is immutable whereas StringBuffer and StringBuilder are mutable classes. StringBuffer is thread-safe and synchronized whereas StringBuilder is not. That's why StringBuilder is faster than StringBuffer. String concatenation operator (+) internally uses StringBuffer or StringBuilder class.

Which is faster string concatenation or the StringBuilder class?

Note that regular string concatenations are faster than using the StringBuilder but only when you're using a few of them at a time. If you are using two or three string concatenations, use a string.

What is the difference between appending a string to a StringBuilder and concatenating two strings with a operator?

Performance wise difference between + operator and StringBuilder. append is, + has a very small overhead of instantiating StringBuilder instance and converting result back to String object. This will reflect in the performance graph above.

What method is used to concatenate a string to a StringBuilder object?

C# String Builder represented by the StringBuilder class in c# is used to concatenate strings in C# and provides string modifications methods including StringBuilder. Append(), StringBuilder. Remove(), and StringBuilder. Replace().


2 Answers

The definite answer will have to come from the compiler design team. But let me take a stab here...

If your question is, why the compiler doesn't turn this:

string s = ""; for( int i = 0; i < 100; i ++ )     s = string.Concat( s, i.ToString() ); 

into this:

StringBuilder sb = new StringBuilder(); for( int i = 0; i < 100; i++ )     sb.Append( i.ToString() ); string s = sb.ToString(); 

The most likely answer is that this is not an optimization. This is a rewrite of the code that introduces new constructs based on knowledge and intent that the developer has - not the compiler.

This type of change would require the compiler to have more knowledge of the BCL than is appropriate. What if tomorrow, some more optimal string assembly service becomes available? Should the compiler use that?

What if your loop conditions were more complicated, should the compiler attempt to perform some static analysis to decide whether the result of such a rewrite would still be functionally equivalent? In many ways, this would be like solving the halting problem.

Finally, I'm not sure that in all cases this would result in faster performing code. There is a cost to instantiating a StringBuilder and resizing its internal buffer as text is appended. In fact, the cost of appending is strongly tied to the size of the string being concatenated, how many there are, what memory pressure looks like. These are things that the compiler cannot predict in advance.

It's your job as a developer to write well-performing code. The compiler can only help by making certain safe, invariant-preserving optimizations. Not rewriting your code for you.

like image 181
LBushkin Avatar answered Sep 26 '22 02:09

LBushkin


LBuskin's answer is excellent; I have just a couple of things to add.

First, JScript.NET does do this optimization. JScript is frequently used by less-experienced programmers for tasks that involve construction of large strings in loops, like building up JSON objects, HTML data, and so on.

Since those programmers might not be aware of the n-squared cost of naive string allocation, might not be aware of the existence of string builders, and frequently write code using this pattern, we felt that it was reasonable to put this optimization into JScript.NET.

C# programmers tend to be more aware of the underlying costs of the code they write and more aware of the existence of off-the-shelf parts like StringBuilder, so they need this optimization less. And more fundamentally, the design philosophy of C# is that it is a "do what I said" language with a minimum of "magic"; JScript is a "do what I mean" language that does its best to figure out how to best serve you, even if that means sometimes guessing wrong. Both philosophies are valid and useful.

Sometimes it does "go the other way". Compare this choice to the choice we make for switches on strings. Switches on strings are actually compiled as a creation of a dictionary containing the strings, rather than as a series of string comparisons. That optimization could be bad; it might be faster to simply do the string comparisons. But here we make a guess that you "meant" the switch to be a table lookup rather than a series of "if" statements -- if you'd meant the series of if statements, you could easily write that yourself.

like image 20
Eric Lippert Avatar answered Sep 23 '22 02:09

Eric Lippert