Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does adding an extra field to struct greatly improves its performance?

Tags:

c#

.net

clr

mono

cil

I noticed that a struct wrapping a single float is significantly slower than using a float directly, with approximately half of the performance.

using System;
using System.Diagnostics;

struct Vector1 {

    public float X;

    public Vector1(float x) {
        X = x;
    }

    public static Vector1 operator +(Vector1 a, Vector1 b) {
        a.X = a.X + b.X;
        return a;
    }
}

However, upon adding an additional 'extra' field, some magic seems to happen and performance once again becomes more reasonable:

struct Vector1Magic {

    public float X;
    private bool magic;

    public Vector1Magic(float x) {
        X = x;
        magic = true;
    }

    public static Vector1Magic operator +(Vector1Magic a, Vector1Magic b) {
        a.X = a.X + b.X;
        return a;
    }
}

The code I used to benchmark these is as follows:

class Program {
    static void Main(string[] args) {
        int iterationCount = 1000000000;
        var sw = new Stopwatch();
        sw.Start();
        var total = 0.0f;
        for (int i = 0; i < iterationCount; i++) {
            var v = (float) i;
            total = total + v;
        }
        sw.Stop();
        Console.WriteLine("Float time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("total = {0}", total);
        sw.Reset();
        sw.Start();
        var totalV = new Vector1(0.0f);
        for (int i = 0; i < iterationCount; i++) {
            var v = new Vector1(i);
            totalV += v;
        }
        sw.Stop();
        Console.WriteLine("Vector1 time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("totalV = {0}", totalV);
        sw.Reset();
        sw.Start();
        var totalVm = new Vector1Magic(0.0f);
        for (int i = 0; i < iterationCount; i++) {
            var vm = new Vector1Magic(i);
            totalVm += vm;
        }
        sw.Stop();
        Console.WriteLine("Vector1Magic time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("totalVm = {0}", totalVm);
        Console.Read();
    }
}

With the benchmark results:

Float time was 00:00:02.2444910 for 1000000000 iterations.
Vector1 time was 00:00:04.4490656 for 1000000000 iterations.
Vector1Magic time was 00:00:02.2262701 for 1000000000 iterations.

Compiler/environment settings: OS: Windows 10 64 bit Toolchain: VS2017 Framework: .Net 4.6.2 Target: Any CPU Prefer 32 bit

If 64 bit is set as the target, our results are more predictable, but significantly worse than what we see with Vector1Magic on the 32 bit target:

Float time was 00:00:00.6800014 for 1000000000 iterations.
Vector1 time was 00:00:04.4572642 for 1000000000 iterations.
Vector1Magic time was 00:00:05.7806399 for 1000000000 iterations.

For the real wizards, I've included a dump of the IL here: https://pastebin.com/sz2QLGEx

Further investigation indicates that this seems to be specific to the windows runtime, as the mono compiler produces the same IL.

On the mono runtime, both struct variants have roughly 2x slower performance compared to the raw float. This is quite a bit different to the performance we see on .Net.

What's going on here?

*Note this question originally included a flawed benchmark process (Thanks Max Payne for pointing this out), and has been updated to more accurately reflect the timings.

like image 390
Varon Avatar asked Jun 03 '17 14:06

Varon


People also ask

What is the purpose of struct in C#?

The struct (structure) is like a class in C# that is used to store data. However, unlike classes, a struct is a value type. Suppose we want to store the name and age of a person. We can create two variables: name and age and store value.

Are structs efficient?

Using struct has traditionally been more efficient, as searching for the correct method can be expensive. The performance of classes was significantly worse before Mathworks implemented the Execution Engine several releases ago.

When should I use struct instead of class?

Class instances each have an identity and are passed by reference, while structs are handled and mutated as values. Basically, if we want all of the changes that are made to a given object to be applied the same instance, then we should use a class — otherwise a struct will most likely be a more appropriate choice.

Why do we use struct?

Structures (also called structs) are a way to group several related variables into one place. Each variable in the structure is known as a member of the structure. Unlike an array, a structure can contain many different data types (int, float, char, etc.).


1 Answers

The jit has an optimization known as "struct promotion" where it can effectively replace a struct local or argument with multiple locals, one for each of the struct's fields.

Struct promotion of a single struct-wrapped float however is disabled. The reasons are a bit obscure, but roughly:

  • structs that simply wrap primitive types are treated as integer values of the struct size when being passed to or returned from calls
  • during promotion analysis the jit can't tell if the struct is ever passed to or returned from a call.
  • the code sequences needed at calls to reclassify an int as a float (and vice versa) are thought to be expensive at runtime.
  • hence the struct is not promoted and so access and operations on the float field are a bit slower.

So roughly speaking the jit is prioritizing reducing the costs at call sites over improving the costs at places where the field is used. And sometimes (as in your case above, where operation costs predominate) this is not the right call.

As you have seen, if you make the struct larger then the rules for passing and returning the struct change (it is now passed returned by reference) and this unblocks promotion.

In the CoreCLR sources you can see this logic at play in Compiler::lvaShouldPromoteStructVar.

like image 82
Andy Ayers Avatar answered Sep 16 '22 22:09

Andy Ayers