I noticed that a struct wrapping a single float is significantly slower than using a float directly, with approximately half of the performance.
using System;
using System.Diagnostics;
struct Vector1 {
public float X;
public Vector1(float x) {
X = x;
}
public static Vector1 operator +(Vector1 a, Vector1 b) {
a.X = a.X + b.X;
return a;
}
}
However, upon adding an additional 'extra' field, some magic seems to happen and performance once again becomes more reasonable:
struct Vector1Magic {
public float X;
private bool magic;
public Vector1Magic(float x) {
X = x;
magic = true;
}
public static Vector1Magic operator +(Vector1Magic a, Vector1Magic b) {
a.X = a.X + b.X;
return a;
}
}
The code I used to benchmark these is as follows:
class Program {
static void Main(string[] args) {
int iterationCount = 1000000000;
var sw = new Stopwatch();
sw.Start();
var total = 0.0f;
for (int i = 0; i < iterationCount; i++) {
var v = (float) i;
total = total + v;
}
sw.Stop();
Console.WriteLine("Float time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("total = {0}", total);
sw.Reset();
sw.Start();
var totalV = new Vector1(0.0f);
for (int i = 0; i < iterationCount; i++) {
var v = new Vector1(i);
totalV += v;
}
sw.Stop();
Console.WriteLine("Vector1 time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("totalV = {0}", totalV);
sw.Reset();
sw.Start();
var totalVm = new Vector1Magic(0.0f);
for (int i = 0; i < iterationCount; i++) {
var vm = new Vector1Magic(i);
totalVm += vm;
}
sw.Stop();
Console.WriteLine("Vector1Magic time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("totalVm = {0}", totalVm);
Console.Read();
}
}
With the benchmark results:
Float time was 00:00:02.2444910 for 1000000000 iterations.
Vector1 time was 00:00:04.4490656 for 1000000000 iterations.
Vector1Magic time was 00:00:02.2262701 for 1000000000 iterations.
Compiler/environment settings: OS: Windows 10 64 bit Toolchain: VS2017 Framework: .Net 4.6.2 Target: Any CPU Prefer 32 bit
If 64 bit is set as the target, our results are more predictable, but significantly worse than what we see with Vector1Magic on the 32 bit target:
Float time was 00:00:00.6800014 for 1000000000 iterations.
Vector1 time was 00:00:04.4572642 for 1000000000 iterations.
Vector1Magic time was 00:00:05.7806399 for 1000000000 iterations.
For the real wizards, I've included a dump of the IL here: https://pastebin.com/sz2QLGEx
Further investigation indicates that this seems to be specific to the windows runtime, as the mono compiler produces the same IL.
On the mono runtime, both struct variants have roughly 2x slower performance compared to the raw float. This is quite a bit different to the performance we see on .Net.
What's going on here?
*Note this question originally included a flawed benchmark process (Thanks Max Payne for pointing this out), and has been updated to more accurately reflect the timings.
The struct (structure) is like a class in C# that is used to store data. However, unlike classes, a struct is a value type. Suppose we want to store the name and age of a person. We can create two variables: name and age and store value.
Using struct has traditionally been more efficient, as searching for the correct method can be expensive. The performance of classes was significantly worse before Mathworks implemented the Execution Engine several releases ago.
Class instances each have an identity and are passed by reference, while structs are handled and mutated as values. Basically, if we want all of the changes that are made to a given object to be applied the same instance, then we should use a class — otherwise a struct will most likely be a more appropriate choice.
Structures (also called structs) are a way to group several related variables into one place. Each variable in the structure is known as a member of the structure. Unlike an array, a structure can contain many different data types (int, float, char, etc.).
The jit has an optimization known as "struct promotion" where it can effectively replace a struct local or argument with multiple locals, one for each of the struct's fields.
Struct promotion of a single struct-wrapped float however is disabled. The reasons are a bit obscure, but roughly:
So roughly speaking the jit is prioritizing reducing the costs at call sites over improving the costs at places where the field is used. And sometimes (as in your case above, where operation costs predominate) this is not the right call.
As you have seen, if you make the struct larger then the rules for passing and returning the struct change (it is now passed returned by reference) and this unblocks promotion.
In the CoreCLR sources you can see this logic at play in Compiler::lvaShouldPromoteStructVar
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With