I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. The function is to be written to optimize performance when it is called thousands of times per second.
I know that .net has a function called String.Join
, to which I may pass in the space character as a separator along with the original string.
Barring the use of String.Join
, I can use the StringBuilder
class to append spaces after each character.
Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces). The character array can be filled in a loop and then passed to the String constructor
.
I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World"
and measures how long it takes to execute. Method (3) is much, much faster than (1) or (2).
I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join
should yield good performance. Why is using String.Join
so much slower than doing the work by hand?
public static class TestClass
{
// 491 milliseconds for 1 million iterations
public static string Space1(string s)
{
return string.Join(" ", s.AsEnumerable());
}
//190 milliseconds for 1 million iterations
public static string Space2(string s)
{
if (s.Length < 2)
return s;
StringBuilder sb = new StringBuilder();
sb.Append(s[0]);
for (int i = 1; i < s.Length; i++)
{
sb.Append(' ');
sb.Append(s[i]);
}
return sb.ToString();
}
// 50 milliseconds for 1 million iterations
public static string Space3(string s)
{
if (s.Length < 2)
return s;
char[] array = new char[s.Length * 2 - 1];
array[0] = s[0];
for (int i = 1; i < s.Length; i++)
{
array[2*i-1] = ' ';
array[2*i] = s[i];
}
return new string(array);
}
Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.
Long answer: if you already have an array of strings to concatenate together (with a delimiter), String. Join is the fastest way of doing it. String. Join can look through all of the strings to work out the exact length it needs, then go again and copy all the data.
When concatenating three dynamic string values or less, use traditional string concatenation. When concatenating more than three dynamic string values, use StringBuilder . When building a big string from several string literals, use either the @ string literal or the inline + operator.
I have taken a look at String. Format (using Reflector) and it actually creates a StringBuilder then calls AppendFormat on it. So it is quicker than concat for multiple stirngs.
As you can see in Figure 1, concatenating strings using StringBuilder is much faster, consumes much less memory, and uses fewer garbage collections in all generations, compared to using the '+' operator to combine two or more strings.
Why is using String.Join so much slower than doing the work by hand?
The reason String.Join
is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>
.
String.Join<T>(string, IEnumerable<T>)
(the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.
Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.
In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join
. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.
You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With