I wanted to make a few integer-sized struct
s (i.e. 32 and 64 bits) that are easily convertible to/from primitive unmanaged types of the same size (i.e. Int32
and UInt32
for 32-bit-sized struct in particular).
The structs would then expose additional functionality for bit manipulation / indexing that is not available on integer types directly. Basically, as a sort of syntactic sugar, improving readability and ease of use.
The important part, however, was performance, in that there should essentially be 0 cost for this extra abstraction (at the end of the day the CPU should "see" the same bits as if it was dealing with primitive ints).
Below is just the very basic struct
I came up with. It does not have all the functionality, but enough to illustrate my questions:
[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32 {
[FieldOffset(3)]
public byte Byte1;
[FieldOffset(2)]
public ushort UShort1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(1)]
public byte Byte3;
[FieldOffset(0)]
public ushort UShort2;
[FieldOffset(0)]
public byte Byte4;
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i;
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i;
}
I wanted to test the performance of this struct. In particular I wanted to see if it could let me get the individual bytes just as quickly if I were to use regular bitwise arithmetic: (i >> 8) & 0xFF
(to get the 3rd byte for example).
Below you will see a benchmark I came up with:
public unsafe class MyBenchmark {
const int count = 50000;
[Benchmark(Baseline = true)]
public static void Direct() {
var j = 0;
for (int i = 0; i < count; i++) {
//var b1 = i.Byte1();
//var b2 = i.Byte2();
var b3 = i.Byte3();
//var b4 = i.Byte4();
j += b3;
}
}
[Benchmark]
public static void ViaStructPointer() {
var j = 0;
int i = 0;
var s = (Mask32*)&i;
for (; i < count; i++) {
//var b1 = s->Byte1;
//var b2 = s->Byte2;
var b3 = s->Byte3;
//var b4 = s->Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaStructPointer2() {
var j = 0;
int i = 0;
for (; i < count; i++) {
var s = *(Mask32*)&i;
//var b1 = s.Byte1;
//var b2 = s.Byte2;
var b3 = s.Byte3;
//var b4 = s.Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaStructCast() {
var j = 0;
for (int i = 0; i < count; i++) {
Mask32 m = i;
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaUnsafeAs() {
var j = 0;
for (int i = 0; i < count; i++) {
var m = Unsafe.As<int, Mask32>(ref i);
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}
}
The Byte1()
, Byte2()
, Byte3()
, and Byte4()
are just the extension methods that do get inlined and simply get the n-th byte by doing bitwise operations and casting:
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte1(this int it) => (byte)(it >> 24);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte4(this int it) => (byte)it;
EDIT: Fixed the code to make sure variables are actually used. Also commented out 3 of 4 variables to really test struct casting / member access rather than actually using the variables.
I ran these in the Release build with optimizations on x64.
Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC
[Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
Method | Mean | Error | StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 14.47 us | 0.3314 us | 0.2938 us | 1.00 | 0.00 |
ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us | 7.70 | 0.15 |
ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us | 7.07 | 0.14 |
ViaStructCast | 29.00 us | 0.3159 us | 0.2800 us | 2.01 | 0.04 |
ViaUnsafeAs | 14.32 us | 0.0955 us | 0.0894 us | 0.99 | 0.02 |
EDIT: New results after fixing the code:
Method | Mean | Error | StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 57.51 us | 1.1070 us | 1.0355 us | 1.00 | 0.00 |
ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us | 3.53 | 0.08 |
ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us | 3.45 | 0.06 |
ViaStructCast | 79.68 us | 1.5478 us | 1.7824 us | 1.39 | 0.04 |
ViaUnsafeAs | 57.01 us | 0.8266 us | 0.6902 us | 0.99 | 0.02 |
The benchmark results were surprising for me, and that's why I have a few questions:
EDIT: Fewer questions remain after altering the code so that the variables actually get used.
System.Runtime.CompilerServices.Unsafe
package (v. 4.5.0) is so fast? I thought it would at least involve a method call...UInt64
so that I can more effectively manipulate / read that memory? What's the best practice here?Unsafe code in C# isn't necessarily dangerous; it's just code whose safety cannot be verified. Unsafe code has the following properties: Methods, types, and code blocks can be defined as unsafe. In some cases, unsafe code may increase an application's performance by removing array bounds checks.
A struct can be either passed/returned by value or passed/returned by reference (via a pointer) in C. The general consensus seems to be that the former can be applied to small structs without penalty in most cases.
But remember if one of the structure elements happens to be an array, it will automatically be passed by reference as the arrays cannot be passed by value.
When you take the address of a local the jit generally has to keep that local on the stack. That's the case here. In the ViaPointer
version i
is kept on the stack. In the ViaUnsafe
, i
is copied to a temp and the temp is kept on the stack. The former is slower because i
is also used to control the iteration of the loop.
You can get pretty close to the ViaUnsafe
perf with the following code where you explicitly make a copy:
public static int ViaStructPointer2()
{
int total = 0;
for (int i = 0; i < count; i++)
{
int j = i;
var s = (Mask32*)&j;
total += s->Byte1;
}
return total;
}
ViaStructPointer took 00:00:00.1147793
ViaUnsafeAs took 00:00:00.0282828
ViaStructPointer2 took 00:00:00.0257589
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With