Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are operators so much slower than method calls? (structs are slower only on older JITs)

Intro: I write high-performance code in C#. Yes, I know C++ would give me better optimization, but I still choose to use C#. I do not wish to debate that choice. Rather, I'd like to hear from those who, like me, are trying to write high-performance code on the .NET Framework.

Questions:

  • Why is the operator in the code below slower than the equivalent method call??
  • Why is the method passing two doubles in the code below faster than the equivalent method passing a struct that has two doubles inside? (A: older JITs optimize structs poorly)
  • Is there a way to get the .NET JIT Compiler to treat simple structs as efficiently as the members of the struct? (A: get newer JIT)

What I think I know: The original .NET JIT Compiler would not inline anything that involved a struct. Bizarre given structs should only be used where you need small value types that should be optimized like built-ins, but true. Fortunately, in .NET 3.5SP1 and .NET 2.0SP2, they made some improvements to the JIT Optimizer, including improvements to inlining, particularly for structs. (I am guessing they did that because otherwise the new Complex struct that they were introducing would have performed horribly... so the Complex team was probably pounding on the JIT Optimizer team.) So, any documentation prior to .NET 3.5 SP1 is probably not too relevant to this issue.

What my testing shows: I have verified that I do have the newer JIT Optimizer by checking that C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll file does have version >= 3053 and so should have those improvements to the JIT optimizer. However, even with that, what my timings and looks at the disassembly both show are:

The JIT-produced code for passing a struct with two doubles is far less efficient than code that directly passes the two doubles.

The JIT-produced code for a struct method passes in 'this' far more efficiently than if you passed a struct as an argument.

The JIT still inlines better if you pass two doubles rather than passing a struct with two doubles, even with the multiplier due to being clearly in a loop.

The Timings: Actually, looking at the disassembly I realize that most of the time in the loops is just accessing the test data out of the List. The difference between the four ways of making the same calls is dramatically different if you factor out the overhead code of the loop and the accessing of the data. I get anywhere from 5x to 20x speedups for doing PlusEqual(double, double) instead of PlusEqual(Element). And 10x to 40x for doing PlusEqual(double, double) instead of operator +=. Wow. Sad.

Here's one set of timings:

Populating List<Element> took 320ms. The PlusEqual() method took 105ms. The 'same' += operator took 131ms. The 'same' -= operator took 139ms. The PlusEqual(double, double) method took 68ms. The do nothing loop took 66ms. The ratio of operator with constructor to method is 124%. The ratio of operator without constructor to method is 132%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 64%. If we remove the overhead time for the loop accessing the elements from the List... The ratio of operator with constructor to method is 166%. The ratio of operator without constructor to method is 187%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 5%. 

The Code:

namespace OperatorVsMethod {   public struct Element   {     public double Left;     public double Right;      public Element(double left, double right)     {       this.Left = left;       this.Right = right;     }      public static Element operator +(Element x, Element y)     {       return new Element(x.Left + y.Left, x.Right + y.Right);     }      public static Element operator -(Element x, Element y)     {       x.Left += y.Left;       x.Right += y.Right;       return x;     }          /// <summary>     /// Like the += operator; but faster.     /// </summary>     public void PlusEqual(Element that)     {       this.Left += that.Left;       this.Right += that.Right;     }          /// <summary>     /// Like the += operator; but faster.     /// </summary>     public void PlusEqual(double thatLeft, double thatRight)     {       this.Left += thatLeft;       this.Right += thatRight;     }       }        [TestClass]   public class UnitTest1   {     [TestMethod]     public void TestMethod1()     {       Stopwatch stopwatch = new Stopwatch();        // Populate a List of Elements to multiply together       int seedSize = 4;       List<double> doubles = new List<double>(seedSize);       doubles.Add(2.5d);       doubles.Add(100000d);       doubles.Add(-0.5d);       doubles.Add(-100002d);        int size = 2500000 * seedSize;       List<Element> elts = new List<Element>(size);        stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         int di = ii % seedSize;         double d = doubles[di];         elts.Add(new Element(d, d));       }       stopwatch.Stop();       long populateMS = stopwatch.ElapsedMilliseconds;        // Measure speed of += operator (calls ctor)       Element operatorCtorResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         operatorCtorResult += elts[ii];       }       stopwatch.Stop();       long operatorCtorMS = stopwatch.ElapsedMilliseconds;        // Measure speed of -= operator (+= without ctor)       Element operatorNoCtorResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         operatorNoCtorResult -= elts[ii];       }       stopwatch.Stop();       long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;        // Measure speed of PlusEqual(Element) method       Element plusEqualResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         plusEqualResult.PlusEqual(elts[ii]);       }       stopwatch.Stop();       long plusEqualMS = stopwatch.ElapsedMilliseconds;        // Measure speed of PlusEqual(double, double) method       Element plusEqualDDResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         Element elt = elts[ii];         plusEqualDDResult.PlusEqual(elt.Left, elt.Right);       }       stopwatch.Stop();       long plusEqualDDMS = stopwatch.ElapsedMilliseconds;        // Measure speed of doing nothing but accessing the Element       Element doNothingResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         Element elt = elts[ii];         double left = elt.Left;         double right = elt.Right;       }       stopwatch.Stop();       long doNothingMS = stopwatch.ElapsedMilliseconds;        // Report results       Assert.AreEqual(1d, operatorCtorResult.Left, "The operator += did not compute the right result!");       Assert.AreEqual(1d, operatorNoCtorResult.Left, "The operator += did not compute the right result!");       Assert.AreEqual(1d, plusEqualResult.Left, "The operator += did not compute the right result!");       Assert.AreEqual(1d, plusEqualDDResult.Left, "The operator += did not compute the right result!");       Assert.AreEqual(1d, doNothingResult.Left, "The operator += did not compute the right result!");        // Report speeds       Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);       Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);       Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);       Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);       Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);       Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);        // Compare speeds       long percentageRatio = 100L * operatorCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * plusEqualDDMS / plusEqualMS;       Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);        operatorCtorMS -= doNothingMS;       operatorNoCtorMS -= doNothingMS;       plusEqualMS -= doNothingMS;       plusEqualDDMS -= doNothingMS;       Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");       percentageRatio = 100L * operatorCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * plusEqualDDMS / plusEqualMS;       Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);     }   } } 

The IL: (aka. what some of the above gets compiled into)

public void PlusEqual(Element that)     { 00000000 push    ebp  00000001 mov     ebp,esp  00000003 push    edi  00000004 push    esi  00000005 push    ebx  00000006 sub     esp,30h  00000009 xor     eax,eax  0000000b mov     dword ptr [ebp-10h],eax  0000000e xor     eax,eax  00000010 mov     dword ptr [ebp-1Ch],eax  00000013 mov     dword ptr [ebp-3Ch],ecx  00000016 cmp     dword ptr ds:[04C87B7Ch],0  0000001d je     00000024  0000001f call    753081B1  00000024 nop              this.Left += that.Left; 00000025 mov     eax,dword ptr [ebp-3Ch]  00000028 fld     qword ptr [ebp+8]  0000002b fadd    qword ptr [eax]  0000002d fstp    qword ptr [eax]        this.Right += that.Right; 0000002f mov     eax,dword ptr [ebp-3Ch]  00000032 fld     qword ptr [ebp+10h]  00000035 fadd    qword ptr [eax+8]  00000038 fstp    qword ptr [eax+8]      } 0000003b nop        0000003c lea     esp,[ebp-0Ch]  0000003f pop     ebx  00000040 pop     esi  00000041 pop     edi  00000042 pop     ebp  00000043 ret     10h   public void PlusEqual(double thatLeft, double thatRight)     { 00000000 push    ebp  00000001 mov     ebp,esp  00000003 push    edi  00000004 push    esi  00000005 push    ebx  00000006 sub     esp,30h  00000009 xor     eax,eax  0000000b mov     dword ptr [ebp-10h],eax  0000000e xor     eax,eax  00000010 mov     dword ptr [ebp-1Ch],eax  00000013 mov     dword ptr [ebp-3Ch],ecx  00000016 cmp     dword ptr ds:[04C87B7Ch],0  0000001d je     00000024  0000001f call    75308159  00000024 nop              this.Left += thatLeft; 00000025 mov     eax,dword ptr [ebp-3Ch]  00000028 fld     qword ptr [ebp+10h]  0000002b fadd    qword ptr [eax]  0000002d fstp    qword ptr [eax]        this.Right += thatRight; 0000002f mov     eax,dword ptr [ebp-3Ch]  00000032 fld     qword ptr [ebp+8]  00000035 fadd    qword ptr [eax+8]  00000038 fstp    qword ptr [eax+8]      } 0000003b nop        0000003c lea     esp,[ebp-0Ch]  0000003f pop     ebx  00000040 pop     esi  00000041 pop     edi  00000042 pop     ebp  00000043 ret     10h  
like image 277
Brian Kennedy Avatar asked Sep 30 '11 20:09

Brian Kennedy


People also ask

Why are structs faster than classes?

So based on the above theory we can say that Struct is faster than Class because: To store class, Apple first finds memory in Heap, then maintain the extra field for RETAIN count. Also, store reference of Heap into Stack. So when it comes to access part, it has to process stack and heap.

Are structs more performant than classes?

The only difference between these two methods is that the one allocates classes, and the other allocates structs. MeasureTestC allocates structs and runs in only 17 milliseconds which is 8.6 times faster than MeasureTestB which allocates classes! That's quite a difference!


2 Answers

I'm getting very different results, much less dramatic. But didn't use the test runner, I pasted the code into a console mode app. The 5% result is ~87% in 32-bit mode, ~100% in 64-bit mode when I try it.

Alignment is critical on doubles, the .NET runtime can only promise an alignment of 4 on a 32-bit machine. Looks to me the test runner is starting the test methods with a stack address that's aligned to 4 instead of 8. The misalignment penalty gets very large when the double crosses a cache line boundary.

like image 194
Hans Passant Avatar answered Oct 11 '22 12:10

Hans Passant


I'm having some difficulty replicating your results.

I took your code:

  • made it a standalone console application
  • built an optimized (release) build
  • increased the "size" factor from 2.5M to 10M
  • ran it from the command line (outside the IDE)

When I did so, I got the following timings which are far different from yours. For the avoidance of doubt, I'll post exactly the code I used.

Here are my timings

Populating List<Element> took 527ms. The PlusEqual() method took 450ms. The 'same' += operator took 386ms. The 'same' -= operator took 446ms. The PlusEqual(double, double) method took 413ms. The do nothing loop took 229ms. The ratio of operator with constructor to method is 85%. The ratio of operator without constructor to method is 99%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 91%. If we remove the overhead time for the loop accessing the elements from the List... The ratio of operator with constructor to method is 71%. The ratio of operator without constructor to method is 98%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 83%. 

And these are my edits to your code:

namespace OperatorVsMethod {   public struct Element   {     public double Left;     public double Right;      public Element(double left, double right)     {       this.Left = left;       this.Right = right;     }          public static Element operator +(Element x, Element y)     {       return new Element(x.Left + y.Left, x.Right + y.Right);     }      public static Element operator -(Element x, Element y)     {       x.Left += y.Left;       x.Right += y.Right;       return x;     }          /// <summary>     /// Like the += operator; but faster.     /// </summary>     public void PlusEqual(Element that)     {       this.Left += that.Left;       this.Right += that.Right;     }          /// <summary>     /// Like the += operator; but faster.     /// </summary>     public void PlusEqual(double thatLeft, double thatRight)     {       this.Left += thatLeft;       this.Right += thatRight;     }       }        public class UnitTest1   {     public static void Main()     {       Stopwatch stopwatch = new Stopwatch();        // Populate a List of Elements to multiply together       int seedSize = 4;       List<double> doubles = new List<double>(seedSize);       doubles.Add(2.5d);       doubles.Add(100000d);       doubles.Add(-0.5d);       doubles.Add(-100002d);        int size = 10000000 * seedSize;       List<Element> elts = new List<Element>(size);        stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         int di = ii % seedSize;         double d = doubles[di];         elts.Add(new Element(d, d));       }       stopwatch.Stop();       long populateMS = stopwatch.ElapsedMilliseconds;        // Measure speed of += operator (calls ctor)       Element operatorCtorResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         operatorCtorResult += elts[ii];       }       stopwatch.Stop();       long operatorCtorMS = stopwatch.ElapsedMilliseconds;        // Measure speed of -= operator (+= without ctor)       Element operatorNoCtorResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         operatorNoCtorResult -= elts[ii];       }       stopwatch.Stop();       long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;        // Measure speed of PlusEqual(Element) method       Element plusEqualResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         plusEqualResult.PlusEqual(elts[ii]);       }       stopwatch.Stop();       long plusEqualMS = stopwatch.ElapsedMilliseconds;        // Measure speed of PlusEqual(double, double) method       Element plusEqualDDResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         Element elt = elts[ii];         plusEqualDDResult.PlusEqual(elt.Left, elt.Right);       }       stopwatch.Stop();       long plusEqualDDMS = stopwatch.ElapsedMilliseconds;        // Measure speed of doing nothing but accessing the Element       Element doNothingResult = new Element(1d, 1d);       stopwatch.Reset();       stopwatch.Start();       for (int ii = 0; ii < size; ++ii)       {         Element elt = elts[ii];         double left = elt.Left;         double right = elt.Right;       }       stopwatch.Stop();       long doNothingMS = stopwatch.ElapsedMilliseconds;        // Report speeds       Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);       Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);       Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);       Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);       Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);       Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);        // Compare speeds       long percentageRatio = 100L * operatorCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * plusEqualDDMS / plusEqualMS;       Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);        operatorCtorMS -= doNothingMS;       operatorNoCtorMS -= doNothingMS;       plusEqualMS -= doNothingMS;       plusEqualDDMS -= doNothingMS;       Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");       percentageRatio = 100L * operatorCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;       Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);       percentageRatio = 100L * plusEqualDDMS / plusEqualMS;       Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);     }   } } 
like image 24
Corey Kosak Avatar answered Oct 11 '22 11:10

Corey Kosak