Consider the following simple program:
using System;
using System.Diagnostics;
class Program
{
private static void Main(string[] args)
{
const int size = 10000000;
var array = new string[size];
var str = new string('a', 100);
var sw = Stopwatch.StartNew();
for (int i = 0; i < size; i++)
{
var str2 = new string('a', 100);
//array[i] = str2; // This is slow
array[i] = str; // This is fast
}
sw.Stop();
Console.WriteLine("Took " + sw.ElapsedMilliseconds + "ms.");
}
}
If I run this, it's relatively fast. If I uncomment the "slow" line and comment-out the "fast" line, it's more than 5x slower. Note that in both situations it initializes the string "str2" inside the loop. This is not optimized away in either case (this can be verified by looking at the IL or disassembly).
The code would seem to be doing about the same amount of work in either case. It needs to allocate/initialize a string, and then assign a reference to an array location. The only difference is whether that reference is the local var "str" or "str2".
Why does it make such a large performance difference assigning the reference to "str" vs. "str2"?
If we look at the disassembly, there is a difference:
(fast)
var str2 = new string('a', 100);
0000008e mov r8d,64h
00000094 mov dx,61h
00000098 xor ecx,ecx
0000009a call 000000005E393928
0000009f mov qword ptr [rsp+58h],rax
000000a4 nop
(slow)
var str2 = new string('a', 100);
00000085 mov r8d,64h
0000008b mov dx,61h
0000008f xor ecx,ecx
00000091 call 000000005E383838
00000096 mov qword ptr [rsp+58h],rax
0000009b mov rax,qword ptr [rsp+58h]
000000a0 mov qword ptr [rsp+38h],rax
The "slow" version has two additional "mov" operations where the "fast" version just has a "nop".
Can anyone explain what's happening here? It's difficult to see how two extra mov operations can cause a >5x slowdown, especially since I would expect the vast bulk of the time should be spend in the string initialization. Thanks for any insights.
You're right that the code does about the same amount of work in either case.
But the garbage collector ends up doing very different things in the two cases.
In the str
version, at most two string instances are alive at a given time. This means (almost) all new objects in generation 0 die, nothing needs to be promoted to generation 1. Since generation 1 isn't growing at all, the GC has no reason to attempt expensive "full collections".
In the str2
version, all the new string instances are alive. Objects get promoted to higher generations (which may involve moving them in memory). Also, since the higher generations are now growing, the GC will occasionally try run full collections.
Note that the .NET GC tends to take time linear to the number of live objects: live objects needs to be traversed and moved out of the way, while dead objects doesn't cost anything at all (they simply get overwritten the next time memory is allocated).
This means str
is the best-case for garbage collector performance; while str2
is the worst-case.
Take a look at the GC performance counters for your program, I suspect you'll see very different results between the programs.
No, a local reference is not slow.
What is slow, is creating tons of new string instances, which are classes. While the fast version reuses the same instance. This also can be optimized away, while the constructor call can not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With