Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does StringBuilder work internally in C#?

How does StringBuilder work?

What does it do internally? Does it use unsafe code? And why is it so fast (compared to the + operator)?

like image 869
Alon Gubkin Avatar asked Jun 29 '11 16:06

Alon Gubkin


People also ask

How does string builder work internally?

StringBuilder objects are like String objects, except that they can be modified. Internally, these objects are treated like variable-length arrays that contain a sequence of characters. At any point, the length and content of the sequence can be changed through method invocations.

How does StringBuffer work internally?

StringBuffer, like StringBuilder allocates an array of char into which it copies the strings you append. It only creates new objects when the number of characters exceeds the size of the array, in which case it reallocates and copies the array.

How does string builder work C#?

The StringBuilder works by maintaining a buffer of characters (Char) that will form the final string. Characters can be appended, removed and manipulated via the StringBuilder, with the modifications being reflected by updating the character buffer accordingly. An array is used for this character buffer.

How does StringBuilder append work?

append(String str) method appends the specified string to this character sequence. The characters of the String argument are appended, in order, increasing the length of this sequence by the length of the argument.


2 Answers

When you use the + operator to build up a string:

string s = "01"; s += "02"; s += "03"; s += "04"; 

then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.

... s += "99"; 

On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.

A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.

What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.

Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.

like image 154
Eric Lippert Avatar answered Sep 25 '22 14:09

Eric Lippert


StringBuilder's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.

The reason StringBuilder is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.

For just a single concatenation, it's actually slightly more efficient to use + than to use StringBuilder. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder shines.

See my article on StringBuilder for more information.

like image 42
Jon Skeet Avatar answered Sep 24 '22 14:09

Jon Skeet