Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are structs so much faster than classes for this specific case?

I have three cases to test the relative performance of classes, classes with inheritence and structs. These are to be used for tight loops so performance counts. Dot products are used as part of many algorithms in 2D and 3D geometry and I have run the profiler on real code. The below tests are indicative of real world performance problems I have seen.

The results for 100000000 times through the loop and application of the dot product gives

ControlA 208 ms   ( class with inheritence )
ControlB 201 ms   ( class with no inheritence )
ControlC 85  ms   ( struct )

The tests were being run without debugging and optimization turned on. My question is, what is it about classes in this case that cause them to be so slow?

I presumed the JIT would still be able to inline all the calls, class or struct, so in effect the results should be identical. Note that if I disable optimizations then my results are identical.

ControlA 3239
ControlB 3228
ControlC 3213

They are always within 20ms of each other if the test is re-run.

The classes under investigation

using System;
using System.Diagnostics;

public class PointControlA
{
    public double X
    {
        get;
        set;
    }

    public double Y
    {
        get;
        set;
    }

    public PointControlA(double x, double y)
    {
        X = x;
        Y = y;
    }
}

public class Point3ControlA : PointControlA
{
    public double Z
    {
        get;
        set;
    }

    public Point3ControlA(double x, double y, double z): base (x, y)
    {
        Z = z;
    }

    public static double Dot(Point3ControlA a, Point3ControlA b)
    {
        return a.X * b.X + a.Y * b.Y + a.Z * b.Z;
    }
}

public class Point3ControlB
{
    public double X
    {
        get;
        set;
    }

    public double Y
    {
        get;
        set;
    }

    public double Z
    {
        get;
        set;
    }

    public Point3ControlB(double x, double y, double z)
    {
        X = x;
        Y = y;
        Z = z;
    }

    public static double Dot(Point3ControlB a, Point3ControlB b)
    {
        return a.X * b.X + a.Y * b.Y + a.Z * b.Z;
    }
}

public struct Point3ControlC
{
    public double X
    {
        get;
        set;
    }

    public double Y
    {
        get;
        set;
    }

    public double Z
    {
        get;
        set;
    }

    public Point3ControlC(double x, double y, double z):this()
    {
        X = x;
        Y = y;
        Z = z;
    }

    public static double Dot(Point3ControlC a, Point3ControlC b)
    {
        return a.X * b.X + a.Y * b.Y + a.Z * b.Z;
    }
}

Test Script

public class Program
{
    public static void TestStructClass()
    {
        var vControlA = new Point3ControlA(11, 12, 13);
        var vControlB = new Point3ControlB(11, 12, 13);
        var vControlC = new Point3ControlC(11, 12, 13);
        var sw = Stopwatch.StartNew();
        var n = 10000000;
        double acc = 0;
        sw = Stopwatch.StartNew();
        for (int i = 0; i < n; i++)
        {
            acc += Point3ControlA.Dot(vControlA, vControlA);
        }

        Console.WriteLine("ControlA " + sw.ElapsedMilliseconds);
        acc = 0;
        sw = Stopwatch.StartNew();
        for (int i = 0; i < n; i++)
        {
            acc += Point3ControlB.Dot(vControlB, vControlB);
        }

        Console.WriteLine("ControlB " + sw.ElapsedMilliseconds);
        acc = 0;
        sw = Stopwatch.StartNew();
        for (int i = 0; i < n; i++)
        {
            acc += Point3ControlC.Dot(vControlC, vControlC);
        }

        Console.WriteLine("ControlC " + sw.ElapsedMilliseconds);
    }

    public static void Main()
    {
        TestStructClass();
    }
}

This dotnet fiddle is proof of compilation only. It does not show the performance differences.

I am trying to explain to a vendor why their choice to use classes instead of structs for small numeric types is a bad idea. I now have the test case to prove it but I can't understand why.

NOTE : I have tried to set a breakpoint in the debugger with JIT optimizations turned on but the debugger will not break. Looking at the IL with JIT optimizations turned off doesn't tell me anything.

EDIT

After the answer by @pkuderov I took his code and played with it. I changed the code and found that if I forced inlining via

   [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static double Dot(Point3Class a)
    {
        return a.X * a.X + a.Y * a.Y + a.Z * a.Z;
    }

the difference between the struct and class for dot product vanished. Why with some setups the attribute is not needed but for me it was is not clear. However I did not give up. There is still a performance problem with the vendor code and I think the DotProduct is not the best example.

I modified @pkuderov's code to implement Vector Add which will create new instances of the structs and classes. The results are here

https://gist.github.com/bradphelan/9b383c8e99edc38068fcc0dccc8a7b48

In the example I also modifed the code to pick a pseudo random vector from an array to avoid the problem of the instances sticking in the registers ( I hope ).

The results show that:

DotProduct performance is identical or maybe faster for classes
Vector Add, and I assume anything that creates a new object is slower.

Add class/class 2777ms Add struct/struct 2457ms

DotProd class/class 1909ms DotProd struct/struct 2108ms

The full code and results are here if anybody wants to try it out.

Edit Again

For the vector add example where an array of vectors is summed together the struct version keeps the accumulator in 3 registers

 var accStruct = new Point3Struct(0, 0, 0);
 for (int i = 0; i < n; i++)
     accStruct = Point3Struct.Add(accStruct, pointStruct[(i + 1) % m]);

the asm body is

// load the next vector into a register
00007FFA3CA2240E  vmovsd      xmm3,qword ptr [rax]  
00007FFA3CA22413  vmovsd      xmm4,qword ptr [rax+8]  
00007FFA3CA22419  vmovsd      xmm5,qword ptr [rax+10h]  
// Sum the accumulator (the accumulator stays in the registers )
00007FFA3CA2241F  vaddsd      xmm0,xmm0,xmm3  
00007FFA3CA22424  vaddsd      xmm1,xmm1,xmm4  
00007FFA3CA22429  vaddsd      xmm2,xmm2,xmm5  

but for class based vector version it reads and writes out the accumulator each time to main memory which is inefficient

var accPC = new Point3Class(0, 0, 0);
for (int i = 0; i < n; i++)
    accPC = Point3Class.Add(accPC, pointClass[(i + 1) % m]);

the asm body is

// Read and add both accumulator X and Xnext from main memory
00007FFA3CA2224A  vmovsd      xmm0,qword ptr [r14+8]     
00007FFA3CA22250  vmovaps     xmm7,xmm0                   
00007FFA3CA22255  vaddsd      xmm7,xmm7,mmword ptr [r12+8]  


// Read and add both accumulator Y and Ynext from main memory
00007FFA3CA2225C  vmovsd      xmm0,qword ptr [r14+10h]  
00007FFA3CA22262  vmovaps     xmm8,xmm0  
00007FFA3CA22267  vaddsd      xmm8,xmm8,mmword ptr [r12+10h] 

// Read and add both accumulator Z and Znext from main memory
00007FFA3CA2226E  vmovsd      xmm9,qword ptr [r14+18h]  
00007FFA3CA22283  vmovaps     xmm0,xmm9  
00007FFA3CA22288  vaddsd      xmm0,xmm0,mmword ptr [r12+18h]

// Move accumulator accumulator X,Y,Z back to main memory.
00007FFA3CA2228F  vmovsd      qword ptr [rax+8],xmm7  
00007FFA3CA22295  vmovsd      qword ptr [rax+10h],xmm8  
00007FFA3CA2229B  vmovsd      qword ptr [rax+18h],xmm0  
like image 691
bradgonesurfing Avatar asked Jul 06 '17 12:07

bradgonesurfing


People also ask

Why is struct faster than class?

So based on the above theory we can say that Struct is faster than Class because: To store class, Apple first finds memory in Heap, then maintain the extra field for RETAIN count. Also, store reference of Heap into Stack. So when it comes to access part, it has to process stack and heap.

Are structs or classes faster?

The only difference between these two methods is that the one allocates classes, and the other allocates structs. MeasureTestC allocates structs and runs in only 17 milliseconds which is 8.6 times faster than MeasureTestB which allocates classes!

Why are structs faster than classes Swift?

Rather than a copy, a reference to the same existing instance is used. Structures and classes in Swift have many things in common. The major difference between structs and classes is that they live in different places in memory. Structs live on the Stack(that's why structs are fast) and Classes live on Heap in RAM.

Are structs faster than classes in C++?

On runtime level there is no difference between structs and classes in C++ at all. So it doesn't make any performance difference whether you use struct A or class A in your code.


1 Answers

Update

After spending some time thinking about problem I think I'm aggree with @DavidHaim that memory jump overhead is not the case here because of caching.

Also I've added to your tests more options (and removed first one with inheritance). So I have:

  • cl = variable of class with 3 points:
    • Dot(cl, cl) - initial method
    • Dot(cl) - which is "square product"
    • Dot(cl.X, cl.Y, cl.Z, cl.X, cl.Y, cl.Z) aka Dot(cl.xyz)- pass fields
  • st = variable of struct with 3 points:
    • Dot(st, st) - initial
    • Dot(st) - square product
    • Dot(st.X, st.Y, st.Z, st.X, st.Y, st.Z) aka Dot(st.xyz) - pass fields
  • st6 = vairable of struct with 6 points:
    • Dot(st6) - wanted to check if size of struct matters
  • Dot(x, y, z, x, y, z) aka Dot(xyz) - just local const double variables.

Result times are:

  • Dot(cl.xyz) is the worst ~570ms,
  • Dot(st6), Dot(st.xyz) is the second worst ~440ms and ~480ms
  • the others are ~325ms

...And I don't really sure why I see these results.

Maybe for plain primitive types compiler does more aggresive pass by register optimizations, maybe it's more sure of lifetime boundaries or constantness and then more aggressive optimizations again. Maybe some kind of loop unwinding.

I think my expertise is just not enough :) But still, my results counter your results.

Full test code with results on my machine and generated IL code you can find here.


In C# classes are reference types and structs are value types. One major effect is that value types can be (and most of the time are!) allocated on the stack, while reference types are always allocated on the heap.

So every time you get access to the inner state of a reference type variable you need to dereference the pointer to memory in the heap (it's a kind of jump), while for value types it's already on the stack or even optimized out to registers.

I think you see a difference because of this.

P.S. btw, by "most of the time are" I meant boxing; it's a technique used to place value type objects on the heap (e.g. to cast value types to an interface or for dynamic method call binding).

like image 165
pkuderov Avatar answered Sep 29 '22 10:09

pkuderov