This is from C# in a nutshell book <pre class="prettyprint"><code>StringBuilder sb = new StringBuilder(); for(int i = 0; i < 50; i++) sb.Append (i + ","); //Outputs 0,1,2,3.............49, </code></pre> However , it then says "the expression i + "," means that we are still repeatedly concatenating strings, howver this only incurs a small performance cost as strings are small" Then it says that changing it to the lines below makes it faster <pre class="prettyprint"><code>for(int i = 0; i < 50; i++) { sb.Append(i.ToString()); sb.Append(","); } </code></pre> But why is that faster? Now we have an extra step where <code>i</code> is being converted to a string? What is actually going under the hood here?There isn't any more explanation in the rest of the chapter.

<blockquote> Now we have an extra step where i is being converted to a string? </blockquote> It's not an extra step. Even in the first snippet, obviously the integer <code>i</code> has to be converted to a string somewhere -- this is taken care of by the addition operator so it happens where you don't see it, but it still happens. The reason the second snippet is faster is because it does not have to create a new string by concatenating the result of <code>i.ToString()</code> and <code>","</code>. Here's what the first version does: <pre class="prettyprint"><code>sb.Append ( i+","); </code></pre> <ol> <li>Call <code>i.ToString</code>.</li> <li>Create a new <code>string</code> (think <code>new string(iAsString + ",")</code>).</li> <li>Call sb.Append.</li> </ol> Here's what the second version does: <ol> <li>Call <code>i.ToString</code>.</li> <li>Call <code>sb.Append</code>.</li> <li>Call <code>sb.Append</code>.</li> </ol> As you can see the only difference is the second step, where calling <code>sb.Append</code> in the second version is expected to be faster than concatenating two strings and creating another instance from the result.

C#: Why does .ToString() append text faster to an int converted to string?

Tags:

c#

This is from C# in a nutshell book

StringBuilder sb = new StringBuilder();
for(int i = 0; i < 50; i++) 
     sb.Append (i + ",");

//Outputs 0,1,2,3.............49,

However , it then says "the expression i + "," means that we are still repeatedly concatenating strings, howver this only incurs a small performance cost as strings are small"

Then it says that changing it to the lines below makes it faster

for(int i = 0; i < 50; i++) {
    sb.Append(i.ToString()); 
    sb.Append(",");
}

But why is that faster? Now we have an extra step where i is being converted to a string? What is actually going under the hood here?There isn't any more explanation in the rest of the chapter.

348

asked Aug 17 '13 22:08

iAteABug_And_iLiked_it

2 Answers

The first two answers to your question are not quite correct. The sb.Append(i + ","); statement does not call i.ToString(), what it actually does is

StringBuilder.Append(string.Concat((object)i, (object)","));

Internally in the string.Concat function, it calls ToString() on the two objects passed in. The key performance concern in this statement is (object)i. This is boxing - wrapping a value type inside a reference. This is a (relatively) sizable performance hit, as it takes extra cycles and memory allocation to box something, and then there's extra garbage collection required.

You can see this happening in the IL of the (Release) compiled code:

IL_000c:  box        [mscorlib]System.Int32
IL_0011:  ldstr      ","
IL_0016:  call       string [mscorlib]System.String::Concat(object,
                                                            object)
IL_001b:  callvirt   instance class [mscorlib]System.Text.StringBuilder 
                     [mscorlib]System.Text.StringBuilder::Append(string)

See that the first line is a box call, followed by a Concat call, ending with finally calling Append.

If you call i.ToString() instead, shown below, you forego the boxing, and also the string.Concat() call.

for (int i = 0; i < 50; i++)
{
    sb.Append(i.ToString());
    sb.Append(",");
}

This call yields the following IL:

IL_000b:  ldloca.s   i
IL_000d:  call       instance string [mscorlib]System.Int32::ToString()
IL_0012:  callvirt   instance class [mscorlib]System.Text.StringBuilder
                     [mscorlib]System.Text.StringBuilder::Append(string)
IL_0017:  pop
IL_0018:  ldloc.0
IL_0019:  ldstr      ","
IL_001e:  callvirt   instance class [mscorlib]System.Text.StringBuilder
                     [mscorlib]System.Text.StringBuilder::Append(string)

Note that there is no boxing, and no String.Concat, therefore there is less resources created that need to be collected, and less cycles wasted on boxing, at the cost of adding one Append() call, which is relatively much cheaper.

This is why the second set of code is better performance.

You can extend this idea to many other things - anywhere that's operating on strings that you're passing a value type into a function that isn't explicitly taking that type as an argument (calls that take an object as an argument, like string.Format() for example), it's a good idea to call <valuetype>.ToString() when passing in a value type argument.

In response to Theodoros' question in the comment:

The compiler team certainly could have decided to do such an optimization, but my guess is that they decided that the cost (in terms of additional complexity, time, additional testing, etc.) made the value of such a change not worth the investment.

Basically, they would have had to put in a special case branching for functions that ostensibly operate on strings, but offer an overload with object in it (basically, if (boxing occurs && overload has string)). Inside that branch the compiler would have to also check to verify that the object function overload does the same things as the string overload with the exception of calling ToString() on the arguments - it needs to do this because a user could create function overloads in which one function takes a string and another takes an object, but the two overloads perform different work on the arguments.

This seems to me like a lot of complexity and analysis for making a minor optimization to a few string manipulation functions. Additionally, this would be mucking around with the core compiler function resolution code, which already has some very exact rules that people misunderstand all the time (take a look at a number of Eric Lippert's answers - quite a few revolve around function resolution issues). Making it more complicated with "it works like this, except when you have that situation" type rules is certainly something to be avoided if the return is minimal.

The less expensive and less complex solution is to use the base function resolution rules, and let the compiler resolve you passing in a value type (like an int) into a function, and having it figure out that the only function signature that fits it is one that takes object, and do a box. Then rely on users to do the optimization of ToString() when they profile their code and determine it is necessary (or just know about this behavior and do it all the time anyway when they encounter the situation, which I do).

A more likely alternative they could have done is have a number of string.Concat overloads that take ints, doubles, etc. (like string.Concat(int, int)) and just call ToString on the arguments internally where they would not be boxed. This has the advantage that the optimization is in the class library instead of the compiler, but then you inevitably run into situations where you want to mix types in the concatenation, like the original question here where you have string.Concat(int, string). The permutations would explode, which is the likely reason they did not do so. They also could have determined the most commonly used situations where such overloads would be used and do the top 5, but I'm guessing they decided that would just open them up to people asking "well, you did (int, string), why don't you do (string, int)?".

169

answered Sep 25 '22 19:09

Gjeltema

Now we have an extra step where i is being converted to a string?

It's not an extra step. Even in the first snippet, obviously the integer i has to be converted to a string somewhere -- this is taken care of by the addition operator so it happens where you don't see it, but it still happens.

The reason the second snippet is faster is because it does not have to create a new string by concatenating the result of i.ToString() and ",".

Here's what the first version does:

sb.Append ( i+",");

Call i.ToString.
Create a new string (think new string(iAsString + ",")).
Call sb.Append.

Here's what the second version does:

Call i.ToString.
Call sb.Append.
Call sb.Append.

As you can see the only difference is the second step, where calling sb.Append in the second version is expected to be faster than concatenating two strings and creating another instance from the result.

answered Sep 22 '22 19:09

Jon

Related questions
                            
                                How to remove XComments from XElement?
                            
                                new classes in for loop
                            
                                Porting code containing unsigned char pointer in C to C#
                            
                                File.Delete Access to the path is denied
                            
                                No DateTime?.ToString(string) overload?
                            
                                Replace all non-word characters with a space
                            
                                Min and Max operations on enum values
                            
                                C# Take ScreenShot of .net control within application and attach to Outlook Email [duplicate]
                            
                                How to pass parameters to another process in c#
                            
                                Unit testing object whose lifetime scope is handled by IoC container
                            
                                Console.Writeline C# - Loses STDOUT when logging
                            
                                How Add custom properties in AppenderSkeleton log4net
                            
                                Why is ASP.NET submitting the original value of a TextBox control when the contents have been changed?
                            
                                Entity Framework - Only update values that are not null
                            
                                Removing items from a list when a condition is met
                            
                                Cast IList<string> to IList<object> fails at runtime
                            
                                How to calculate the digit products of the consecutive numbers efficiently?
                            
                                Generic List to CSV String
                            
                                WCF exception handling using IErrorHandler
                            
                                Variable does not exist in the current context?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With