We have a few operations where we are doing a large number of large string concatenations, and have recently encountered an out of memory exception. Unfortunately, debugging the code is not an option, as this is occurring at a customer site.
So, before looking into a overhaul of our code, I would like to ask: what is the RAM consumption characteristics of StringBuilder for large strings?
Especially as they compare to the standard string type. The size of the strings are well over 10 MB, and we seem to run into the issues around 20 MB.
NOTE: This is not about speed but RAM.
Each time StringBuilder runs out of space, it reallocates a new buffer twice the size of the original buffer, copies the old characters, and lets the old buffer get GC'd. It's possible that you're just using enough (call it x) such that 2x is larger than the memory you're allowed to allocate. You may want to determine a maximum length for your strings, and pass it to the constructor of StringBuilder so you preallocate, and you're not at the mercy of the doubling reallocation.
Here is a nice study about String Concatenation vs Memory Allocation.
If you can avoid concatenating, do it!
This is a no brainer, if you don't have to concatenate but want your source code to look nice, use the first method. It will get optimized as if it was a single string.
Don't use += concatenating ever. Too much changes are taking place behind the scene, which aren't obvious from my code in the first place. I advise to rather use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
Try to estimate the target size of a StringBuilder.
The more accurate you can estimate the needed size, the less temporary strings the StringBuilder will have to create to increase its internal buffer.
Do not use any Format() methods when performance is an issue.
Too much overhead is involved in parsing the format, when you could construct an array out of pieces when all you are using are {x} replaces. Format() is good for readability, but one of the things to go when you are squeezing all possible performance out of your application.
You might be interested by the ropes data structure. This article: Ropes: Theory and practice explains their advantages. Maybe there is an implementation for .NET.
[Update, to answer the comment]
Does it use less memory? Search memory in the article, you will find some hints.
Basically, yes, despite the structure overhead, because it just adds memory when needed. StringBuilder, when exhausting old buffer, must allocate a much bigger one (which can already waste empty memory) and drops the old one (which will be garbage collected, but can still use lot of memory in the mean time).
I haven't found an implementation for .NET, but there is at least a C++ implementation (in SGI's STL: http://www.sgi.com/tech/stl/Rope.html). Maybe you can leverage this implementation. Note the page I reference have a work on memory performance.
Note that Ropes aren't the cure to all problems: their usefulness depends heavily how you build your large strings, and how you use them. The articles point out advantages and drawbacks.
Strigbuilder is a perfectly good solution to memory problems caused by concatenating strings.
To answer your specific question, Stringbuilder has a constant-sized overhead compared to a normal string where the length of the string is equal to the length of the currently-allocated Stringbuilder buffer. The buffer could potentially be twice the size of the string that results, but no more memory allocations will be made when concatenating to the Stringbuilder until the buffer is filled, so it is really an excellent solution.
Compared with string, this is outstanding.
string output = "Test";
output += ", printed on " + datePrinted.ToString();
output += ", verified by " + verificationName;
output += ", number lines: " + numberLines.ToString();
This code has four strings that stored as literals in the code, two that are created in the methods and one from a variable, but it uses six separate intermediate strings which get longer and longer. If this pattern is continued, it will increase memory usage at an exponential rate until the GC kicks in to clean it up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With