Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a local var reference cause a large performance degradation?

Tags:

performance

c#

Consider the following simple program:

using System;
using System.Diagnostics;

class Program
{
   private static void Main(string[] args)
   {
      const int size = 10000000;
      var array = new string[size];

      var str = new string('a', 100);
      var sw = Stopwatch.StartNew();
      for (int i = 0; i < size; i++)
      {
         var str2 = new string('a', 100);
         //array[i] = str2; // This is slow
         array[i] = str; // This is fast
      }
      sw.Stop();
      Console.WriteLine("Took " + sw.ElapsedMilliseconds + "ms.");
   }
}

If I run this, it's relatively fast. If I uncomment the "slow" line and comment-out the "fast" line, it's more than 5x slower. Note that in both situations it initializes the string "str2" inside the loop. This is not optimized away in either case (this can be verified by looking at the IL or disassembly).

The code would seem to be doing about the same amount of work in either case. It needs to allocate/initialize a string, and then assign a reference to an array location. The only difference is whether that reference is the local var "str" or "str2".

Why does it make such a large performance difference assigning the reference to "str" vs. "str2"?

If we look at the disassembly, there is a difference:

(fast)
     var str2 = new string('a', 100);
0000008e  mov         r8d,64h 
00000094  mov         dx,61h 
00000098  xor         ecx,ecx 
0000009a  call        000000005E393928 
0000009f  mov         qword ptr [rsp+58h],rax 
000000a4  nop

(slow)
     var str2 = new string('a', 100);
00000085  mov         r8d,64h 
0000008b  mov         dx,61h 
0000008f  xor         ecx,ecx 
00000091  call        000000005E383838 
00000096  mov         qword ptr [rsp+58h],rax 
0000009b  mov         rax,qword ptr [rsp+58h] 
000000a0  mov         qword ptr [rsp+38h],rax

The "slow" version has two additional "mov" operations where the "fast" version just has a "nop".

Can anyone explain what's happening here? It's difficult to see how two extra mov operations can cause a >5x slowdown, especially since I would expect the vast bulk of the time should be spend in the string initialization. Thanks for any insights.

like image 400
JonB Avatar asked May 09 '16 16:05

JonB


2 Answers

You're right that the code does about the same amount of work in either case.

But the garbage collector ends up doing very different things in the two cases.

In the str version, at most two string instances are alive at a given time. This means (almost) all new objects in generation 0 die, nothing needs to be promoted to generation 1. Since generation 1 isn't growing at all, the GC has no reason to attempt expensive "full collections".

In the str2 version, all the new string instances are alive. Objects get promoted to higher generations (which may involve moving them in memory). Also, since the higher generations are now growing, the GC will occasionally try run full collections.

Note that the .NET GC tends to take time linear to the number of live objects: live objects needs to be traversed and moved out of the way, while dead objects doesn't cost anything at all (they simply get overwritten the next time memory is allocated).

This means str is the best-case for garbage collector performance; while str2 is the worst-case.

Take a look at the GC performance counters for your program, I suspect you'll see very different results between the programs.

like image 80
Daniel Avatar answered Oct 31 '22 16:10

Daniel


No, a local reference is not slow.

What is slow, is creating tons of new string instances, which are classes. While the fast version reuses the same instance. This also can be optimized away, while the constructor call can not.

like image 40
TomTom Avatar answered Oct 31 '22 16:10

TomTom