Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Concatenation unsafe in C#, need to use StringBuilder?

My question is this: Is string concatenation in C# safe? If string concatenation leads to unexpected errors, and replacing that string concatenation by using StringBuilder causes those errors to disappear, what might that indicate?

Background: I am developing a small command line C# application. It takes command line arguments, performs a slightly complicated SQL query, and outputs about 1300 rows of data into a formatted XML file.

My initial program would always run fine in debug mode. However, in release mode it would get to about the 750th SQL result, and then die with an error. The error was that a certain column of data could not be read, even through the Read() method of the SqlDataReader object had just returned true.

This problem was fixed by using StringBuilder for all operations in the code, where previously there had been "string1 + string2". I'm not talking about string concatenation inside the SQL query loop, where StringBuilder was already in use. I'm talking about simple concatenations between two or three short string variables earlier in the code.

I had the impression that C# was smart enough to handle the memory management for adding a few strings together. Am I wrong? Or does this indicate some other sort of code problem?

like image 743
Bryan Roach Avatar asked Apr 22 '09 20:04

Bryan Roach


People also ask

Is string concatenation possible in C?

As you know, the best way to concatenate two strings in C programming is by using the strcat() function.

Is string concatenation bad?

Due to this, mixing the StringBuilder and + method of concatenation is considered bad practice. Additionally, String concatenation using the + operator within a loop should be avoided. Since the String object is immutable, each call for concatenation will result in a new String object being created.

What problems are connected to string concatenation using '+'?

A common antipattern in Python is to concatenate a sequence of strings using + in a loop. This is bad because the Python interpreter has to create a new string object for each iteration, and it ends up taking quadratic time.

Why should you be careful about string concatenation (+) operator in loops?

If you concatenate Stings in loops for each iteration a new intermediate object is created in the String constant pool. This is not recommended as it causes memory issues.


3 Answers

To answer your question: String contatenation in C# (and .NET in general) is "safe", but doing it in a tight loop as you describe is likely to cause severe memory pressure and put strain on the garbage collector.

I would hazard a guess that the errors you speak of were related to resource exhaustion of some sort, but it would be helpful if you could provide more detail — for example, did you receive an exception? Did the application terminate abnormally?

Background: .NET strings are immutable, so when you do a concatenation like this:

var stringList = new List<string> {"aaa", "bbb", "ccc", "ddd", //... };
string result = String.Empty;
foreach (var s in stringList)
{
    result = result + s;
}

This is roughly equivalent to the following:

string result = "";
result = "aaa"
string temp1 = result + "bbb";
result = temp1;
string temp2 = temp1 + "ccc";
result = temp2;
string temp3 = temp2 + "ddd";
result = temp3;
// ...
result = tempN + x;

The purpose of this example is to emphasise that each time around the loop results in the allocation of a new temporary string.

Since the strings are immutable, the runtime has no alternative options but to allocate a new string each time you add another string to the end of your result.

Although the result string is constantly updated to point to the latest and greatest intermediate result, you are producing a lot of these un-named temporary string that become eligible for garbage collection almost immediately.

At the end of this concatenation you will have the following strings stored in memory (assuming, for simplicity, that the garbage collector has not yet run).

string a = "aaa";
string b = "bbb";
string c = "ccc";
// ...
string temp1 = "aaabbb";
string temp2 = "aaabbbccc";
string temp3 = "aaabbbcccddd";
string temp4 = "aaabbbcccdddeee";
string temp5 = "aaabbbcccdddeeefff";
string temp6 = "aaabbbcccdddeeefffggg";
// ...

Although all of these implicit temporary variables are eligible for garbage collection almost immediately, they still have to be allocated. When performing concatenation in a tight loop this is going to put a lot of strain on the garbage collector and, if nothing else, will make your code run very slowly. I have seen the performance impact of this first hand, and it becomes truly dramatic as your concatenated string becomes larger.

The recommended approach is to always use a StringBuilder if you are doing more than a few string concatenations. StringBuilder uses a mutable buffer to reduce the number of allocations that are necessary in building up your string.

like image 102
Daniel Fortunov Avatar answered Oct 04 '22 21:10

Daniel Fortunov


String concatenation is safe though more memory intensive than using a StringBuilder if contatenating large numbers of strings in a loop. And in extreme cases you could be running out of memory.

It's almost certainly a bug in your code.

Maybe you're contatenating a very large number of strings. Or maybe it's something else completely different.

I'd go back to debugging without any preconceptions of the root cause - if you're still having problems try to reduce it to the minimum needed to repro the problem and post code.

like image 26
Joe Avatar answered Oct 04 '22 22:10

Joe


Apart from what you're doing is probably best done with XML APIs instead of strings or StringBuilder I doubt that the error you see is due to string concatenation. Maybe switching to StringBuilder just masked the error or went over it gracefully, but I doubt using strings really was the cause.

like image 28
Joey Avatar answered Oct 04 '22 22:10

Joey