I was doing a local test to compare Replace operations performance from String and StringBuilder in C# but for String I was using the following code:
String str = "String to be tested. String to be tested. String to be tested."
str = str.Replace("i", "in");
str = str.Replace("to", "ott");
str = str.Replace("St", "Tsr");
str = str.Replace(".", "\n");
str = str.Replace("be", "or be");
str = str.Replace("al", "xd");
but then, after noticing that String.Replace() was faster than StringBuilder.Replace() I proceeded to test the following code against the one above:
String str = "String to be tested. String to be tested. String to be tested."
str = str.Replace("i", "in").Replace("to", "ott").Replace("St", "Tsr").Replace(".", "\n").Replace("be", "or be").Replace("al", "xd");
And this last one turned out to be around 10% to 15% times faster, any ideas on why is it faster? Is assigning a value to the same variable that expensive?
I've made this benchmark:
namespace StringReplace
{
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run<Program>();
}
private String str = "String to be tested. String to be tested. String to be tested.";
[Benchmark]
public string Test1()
{
var a = str;
a = a.Replace("i", "in");
a = a.Replace("to", "ott");
a = a.Replace("St", "Tsr");
a = a.Replace(".", "\n");
a = a.Replace("be", "or be");
a = a.Replace("al", "xd");
return a;
}
[Benchmark]
public string Test2()
{
var a = str;
a = a.Replace("i", "in").Replace("to", "ott").Replace("St", "Tsr").Replace(".", "\n").Replace("be", "or be").Replace("al", "xd");
return a;
}
}
}
Results:
BenchmarkDotNet=v0.10.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-7700 CPU 3.60GHz, ProcessorCount=8
Frequency=3515629 Hz, Resolution=284.4441 ns, Timer=TSC
Host Runtime=Clr 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.7.2600.0
Job Runtime(s):
Clr 4.0.30319.42000, Arch=32-bit RELEASE
Method | Mean | StdDev | Median |
------- |---------- |---------- |---------- |
Test1 | 1.3768 us | 0.0354 us | 1.3704 us |
Test2 | 1.3941 us | 0.0325 us | 1.3778 us |
As you see, results are the same in Release mode. So, I think can be small difference in debug mode because of excess assignment of variable. But in release mode compiler can optimize it.
It looks like you're compiling in a Debug configuration. Because the compiler needs to ensure each statement of source code can have a breakpoint set on it, the excerpt that assigns to the local many times is less efficient.
If you compile in a Release configuration, which optimizes code generation at the expense of not letting you set breakpoints, both excerpts compile to the same intermediate code and thus should have the same performance.
Note that whether you compile in a Debug or Release configuration isn't necessarily related to whether you start the app from Visual Studio with a debugger (F5) or not (Ctrl + F5). For more details, see my answer here.
C# compiles down to .NET intermediate language (IL, or MSIL or CIL). There's a tool that ships with the .NET SDK, the IL Disassembler, which can show us this intermediate language to better understand the difference. Note that the .NET runtime (VES) is a stack machine - instead of registers, IL operates on an "operand stack" where on which values are pushed and pulled. The nature isn't too important for this question, but know that the evaluation stack is the place where temporary values are stored.
Disassembling the first excerpt, which I compiled without setting the "optimize code" option (i.e., I compiled using a Debug configuration), shows code like this:
.locals init ([0] string str)
IL_0000: nop
IL_0001: ldstr "String to be tested. String to be tested. String t" + "o be tested."
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldstr "i"
IL_000d: ldstr "in"
IL_0012: callvirt instance string [mscorlib]System.String::Replace(string, string)
IL_0017: stloc.0
IL_0018: ldloc.0
IL_0019: ldstr "to"
IL_001e: ldstr "ott"
IL_0023: callvirt instance string [mscorlib]System.String::Replace(string, string)
The method has one local variable, str
. In brief, the excerpt:
ldstr
).stloc.0
), resulting in an empty evaluation stack.ldloc.0
).Replace
on the loaded value with two other strings, "i" and "in" (the two ldstr
and the callvirt
), resulting in an evaluation stack with only the resulting string.stloc.0
), resulting in an empty evaluation stack.ldloc.0
).Replace
on the loaded value with two other strings, "to" and "ott" (the two ldstr
and the callvirt
).And so on and so forth.
Compare to the second excerpt, also compiled without "optimized code":
.locals init ([0] string str)
IL_0000: nop
IL_0001: ldstr "String to be tested. String to be tested. String t" + "o be tested."
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldstr "i"
IL_000d: ldstr "in"
IL_0012: callvirt instance string [mscorlib]System.String::Replace(string, string)
IL_0017: ldstr "to"
IL_001c: ldstr "ott"
IL_0021: callvirt instance string [mscorlib]System.String::Replace(string, string)
After step 4, the evaluation stack has the result of the first Replace
call on it. Because the C# code in this case doesn't assign this intermediate value to the str
variable, the IL can avoid storing and re-loading the value, and just re-use the result that's already on the evaluation stack. This skips steps 5 and 6, leading to slightly more performant code.
But wait, surely the compiler knows these excerpts are equivalent, right? Why doesn't it always produce the second, more efficient, set of IL instructions? Because I compiled without optimizations. The compiler thus assumes that I need to be able to set a breakpoint on each C# statement. At a breakpoint, the locals need to be in a consistent state, and the evaluation stack needs to be empty. That's why the first excerpt has steps 5 and 6 - so that the debugger can stop on a breakpoint between those steps, and I'll see that the str
local has the value I would expect on that line.
If I compile these excerpts with optimizations on (e.g., I compiled using a Release configuration), then indeed the compiler produces the same code for each:
// no .locals directive
IL_0000: ldstr "String to be tested. String to be tested. String t" + "o be tested."
IL_0005: ldstr "i"
IL_000a: ldstr "in"
IL_000f: callvirt instance string [mscorlib]System.String::Replace(string,strin g)
IL_0014: ldstr "to"
IL_0019: ldstr "ott"
IL_001e: callvirt instance string [mscorlib]System.String::Replace(string, string)
Now that the compiler knows I won't be able to set breakpoints, it can forgo using a local at all, and have the entire set of operations just occur on the evaluation stack. As a result, it can skip steps 2, 3, 5, and 6, leading to further optimized code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With