After doing some profiling, we've discovered that the current way in which our app concatenates strings causes an enormous amount of memory churn and CPU time.
We're building a List<string>
of strings to concatenate that is on the order of 500 thousand elements long, referencing several hundred megabytes worth of strings. We're trying to optimize this one small part of our app since it seems to account for a disproportionate amount of CPU and memory usage.
We do a lot of text processing :)
Theoretically, we should be able to perform the concatenation in a single allocation and N copies - we can know how many total characters are available in our string, so it should just be as simple as summing up the lengths of the component strings and allocating enough underlying memory to hold the result.
Assuming we're starting with a pre-filled List<string>
, is it possible to concatenate all strings in that list using a single allocation?
Currently, we're using the StringBuilder
class, but this stores its own intermediate buffer of all of the characters - so we have an ever growing chunk array, with each chunk storing a copy of the characters we're giving it. Far from ideal. The allocations for the array of chunks aren't horrible, but the worst part is that it allocates intermediate character arrays, which means N allocations and copies.
The best we can do right now is to call List<string>.ToArray()
- which performs one copy of a 500k element array - and pass the resulting string[]
to string.Concat(params string[])
. string.Concat()
then performs two allocations, one to copy the input array into an internal array, and the one to allocate the destination string's memory.
From referencesource.microsoft.com:
public static String Concat(params String[] values) {
if (values == null)
throw new ArgumentNullException("values");
Contract.Ensures(Contract.Result<String>() != null);
// Spec#: Consider a postcondition saying the length of this string == the sum of each string in array
Contract.EndContractBlock();
int totalLength=0;
// -----------> Allocation #1 <---------
String[] internalValues = new String[values.Length];
for (int i=0; i<values.Length; i++) {
string value = values[i];
internalValues[i] = ((value==null)?(String.Empty):(value));
totalLength += internalValues[i].Length;
// check for overflow
if (totalLength < 0) {
throw new OutOfMemoryException();
}
}
return ConcatArray(internalValues, totalLength);
}
private static String ConcatArray(String[] values, int totalLength) {
// -----------------> Allocation #2 <---------------------
String result = FastAllocateString(totalLength);
int currPos=0;
for (int i=0; i<values.Length; i++) {
Contract.Assert((currPos <= totalLength - values[i].Length),
"[String.ConcatArray](currPos <= totalLength - values[i].Length)");
FillStringChecked(result, currPos, values[i]);
currPos+=values[i].Length;
}
return result;
}
Thus, in the best case, we have three allocations, two for arrays referencing the component strings, and one for the destination concatenated string.
Can we improve on this? Is it possible to concatenate a List<string>
using a single allocation and a single loop of character copies?
I'd like to summarize the various approaches discussed so far, and why they are still sub-optimal. I'd also like to set the parameters of the situation in concrete a little more, since I've received a lot of questions that try to side step the central question.
...
First, the structure of the code that I am working within. There are three layers:
Talking about some numbers: a typical batch run will collect ~500000 strings from the content producers, representing about 200-500 MB of memory. I need the most efficient way to concatenate these 500k strings into a single string.
...
Now I'd like to examine the approaches discussed so far. For the sake of numbers, assume we're running 64-bit, assume that we are collecting 500000 string objects, and assume that the aggregate size of the string objects totals 200 megabytes worth of character data. Also, assume that the original string object's memory is not counted toward any approach's total in the below analysis. I make this assumption because it is necessarily common to any and all approaches, because it is an assumption that we cannot change the interface of the content producers - they return 500k relatively small fully formed strings objects that I must then accept and somehow concatenate. As stated above, I cannot change this interface.
Content producers ----> StringBuilder
----> string
Conceptually, this would be invoking the content producers, and directly writing the strings they return to a StringBuilder
, and then later calling StringBuilder.ToString()
to obtain the concatenated string.
By analyzing StringBuilder
's implementation, we can see that the cost of this boils down to 400 MB of allocations and copies:
StringBuilder
. We would be performing one 200 MB allocation to pre-allocate the StringBuilder
, and then 200 MB worth of copies as we copy and discard the strings returned from the content producersStringBuilder
, we then need to call StringBuilder.ToString()
. This performs exactly one allocation (string.FastAllocateString()
), and then copies the string data from its internal buffers to the string object's internal memory.Total cost: approximately 400 MB of allocations and copies
Content producers ---> pre-allocated char[]
---> string
This strategy is fairly simple. Assuming we know roughly how much character data we're going to be collecting from the producers, we can pre-allocate a char[]
that is 200 MB large. Then, as we call the content producers, we copy the strings they return into our char[]
. This accounts for 200 MB of allocations and copies. The final step to turn this into a string object is to pass it to the new string(char[])
constructor. However, since strings are immutable and arrays are not, the constructor will make a copy of that entire array, causing it to allocate and copy another 200 MB of character data.
Total cost: approximately 400 MB of allocations and copies
Content producers ---> List<string>
----> string[]
----> string.Concat(string[])
List<string>
to be about 500k elements - approximately 4 MB of allocations for List's underlying array (500k * 8 bytes per pointer == 4 MB of memory).List<string>.ToArray()
to obtain a string[]
. Approximately 4 MB of allocations and copies (again, we're really just copying pointers).string.Concat(string[])
:
string.FastAllocateString()
special method. Approximately 200 MB of allocations.Total cost: approximately 212 MB of allocations and copies
None of these approaches are ideal, however approach #3 is very close. We're assuming that the absolute minimum of memory that needs to be allocated and copied is 200 MB (for the destination string), and here we get pretty close - 212 MB.
If there were a string.Concat
overload that 1) Accepted an IList<string>
and 2) did not make a copy of that IList before using it, then the problem would be solved. No such method is provided by .Net, hence the subject of this question.
Progress on a solution.
I've done some testing with some hacked IL, and found that directly invoking string.FastAllocateString(n)
(which is not usually invokable...) is about as fast as invoking new string('\0', n)
, and both seem to allocate exactly as much memory as is expected.
From there, it seems its possible to acquire a pointer to the freshly allocated string using the unsafe
and fixed
statements.
And so, a rough solution begins to appear:
private static string Concat( List<string> list )
{
int concatLength = 0;
for( int i = 0; i < list.Count; i++ )
{
concatLength += list[i].Length;
}
string newString = new string( '\0', concatLength );
unsafe
{
fixed( char* ptr = newString )
{
...
}
}
return newString;
}
The next biggest hurdle is implementing or finding an efficient block copy method, ala Buffer.BlockCopy, except one that will accept char*
types.
Unless the average length of the strings is very small, the most efficient approach, given a List<String>
, will be to use ToArray()
to copy it to a new String[]
, and pass that to a concatenation or joining method. Doing that may cause a wasted allocation for an array of references if the concatenation or joining method wants to make a copy of its array before it starts, but that would only allocate one reference per string, there will only be one allocation to hold character data, and it will be correctly sized to hold the entire string.
If you're building the data structure yourself, you might gain a little bit of efficiency by initializing a String[]
to the estimated required size, populating it yourself, and expanding it as needed. That would save one allocation of a String[]
worth of data.
Another approach would be to allocate a String[8192][]
and then allocate a String[8192]
for each array of strings as you go along. Once you're all done, you'll know exactly what size String[]
you need to pass to the Concat
method so you can create an array of that exact size. This approach would require a greater quantity of allocations, but only the final String[]
and the String
itself would need to go on the Large Object Heap.
If you can determine the length of the concatenation before trying to perform the operation, a char array can beat string builder in some use cases. Manipulating the characters within the array prevents the multiple allocations.
See: http://blogs.msdn.com/b/cisg/archive/2008/09/09/performance-analysis-reveals-char-array-is-better-than-stringbuilder.aspx
UPDATE
Please check out this internal implementation of the String.Join
from .NET - it uses unsafe code with pointers to avoid multiple allocations. Unless I'm missing something, it would seem you can re-write this using your List to accomplish what you want:
[System.Security.SecuritySafeCritical] // auto-generated
public unsafe static String Join(String separator, String[] value, int startIndex, int count) {
//Range check the array
if (value == null)
throw new ArgumentNullException("value");
if (startIndex < 0)
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
if (count < 0)
throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
if (startIndex > value.Length - count)
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_IndexCountBuffer"));
Contract.EndContractBlock();
//Treat null as empty string.
if (separator == null) {
separator = String.Empty;
}
//If count is 0, that skews a whole bunch of the calculations below, so just special case that.
if (count == 0) {
return String.Empty;
}
int jointLength = 0;
//Figure out the total length of the strings in value
int endIndex = startIndex + count - 1;
for (int stringToJoinIndex = startIndex; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
if (value[stringToJoinIndex] != null) {
jointLength += value[stringToJoinIndex].Length;
}
}
//Add enough room for the separator.
jointLength += (count - 1) * separator.Length;
// Note that we may not catch all overflows with this check (since we could have wrapped around the 4gb range any number of times
// and landed back in the positive range.) The input array might be modifed from other threads,
// so we have to do an overflow check before each append below anyway. Those overflows will get caught down there.
if ((jointLength < 0) || ((jointLength + 1) < 0) ) {
throw new OutOfMemoryException();
}
//If this is an empty string, just return.
if (jointLength == 0) {
return String.Empty;
}
string jointString = FastAllocateString( jointLength );
fixed (char * pointerToJointString = &jointString.m_firstChar) {
UnSafeCharBuffer charBuffer = new UnSafeCharBuffer( pointerToJointString, jointLength);
// Append the first string first and then append each following string prefixed by the separator.
charBuffer.AppendString( value[startIndex] );
for (int stringToJoinIndex = startIndex + 1; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
charBuffer.AppendString( separator );
charBuffer.AppendString( value[stringToJoinIndex] );
}
Contract.Assert(*(pointerToJointString + charBuffer.Length) == '\0', "String must be null-terminated!");
}
return jointString;
}
Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs
UPDATE 2
Good point on the fast allocate. According to an old SO post, you can wrap FastAllocate using reflection (assuming of course you'd cache the fastAllocate method reference so you just called Invoke
each time. Perhaps the tradeoff of the call is better than what you're doing now.
var fastAllocate = typeof (string).GetMethods(BindingFlags.NonPublic | BindingFlags.Static)
.First(x => x.Name == "FastAllocateString");
var newString = (string)fastAllocate.Invoke(null, new object[] {20});
Console.WriteLine(newString.Length); // 20
Perhaps another approach is to use unsafe code to copy your allocation into a char* array, then pass this to the string constructor. The string constructor with char* is an extern
passed to the underlying C++ implementation. I haven't found a reliable source for that code to confirm, but perhaps this can be faster for you. The non-prod ready code (no checks for potential overflow, add fixed to lock strings from garbage collection, etc) would start with:
public unsafe string MyConcat(List<string> values)
{
int index = 0;
int totalLength = values.Sum(m => m.Length);
char* concat = stackalloc char[totalLength + 1]; // Add additional char for null term
foreach (var value in values)
{
foreach (var c in value)
{
concat[index] = c;
index++;
}
}
concat[index] = '\0';
return new string(concat);
}
Now I'm all out of ideas for this :) Perhaps somebody can figure out a method here with marshalling to avoid unsafe code. Since introducing unsafe code requires adding the unsafe flag to compilation, consider adding this piece as a separate dll to minimize your app's security risk if you go down that route.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With