Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is casting a struct via Pointer slow, while Unsafe.As is fast?

Background

I wanted to make a few integer-sized structs (i.e. 32 and 64 bits) that are easily convertible to/from primitive unmanaged types of the same size (i.e. Int32 and UInt32 for 32-bit-sized struct in particular).

The structs would then expose additional functionality for bit manipulation / indexing that is not available on integer types directly. Basically, as a sort of syntactic sugar, improving readability and ease of use.

The important part, however, was performance, in that there should essentially be 0 cost for this extra abstraction (at the end of the day the CPU should "see" the same bits as if it was dealing with primitive ints).

Sample Struct

Below is just the very basic struct I came up with. It does not have all the functionality, but enough to illustrate my questions:

[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32 {
  [FieldOffset(3)]
  public byte Byte1;
  [FieldOffset(2)]
  public ushort UShort1;
  [FieldOffset(2)]
  public byte Byte2;
  [FieldOffset(1)]
  public byte Byte3;
  [FieldOffset(0)]
  public ushort UShort2;
  [FieldOffset(0)]
  public byte Byte4;

  [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
  public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i;
  [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
  public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i;
}

The Test

I wanted to test the performance of this struct. In particular I wanted to see if it could let me get the individual bytes just as quickly if I were to use regular bitwise arithmetic: (i >> 8) & 0xFF (to get the 3rd byte for example).

Below you will see a benchmark I came up with:

public unsafe class MyBenchmark {

  const int count = 50000;

  [Benchmark(Baseline = true)]
  public static void Direct() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      //var b1 = i.Byte1();
      //var b2 = i.Byte2();
      var b3 = i.Byte3();
      //var b4 = i.Byte4();
      j += b3;
    }
  }


  [Benchmark]
  public static void ViaStructPointer() {
    var j = 0;
    int i = 0;
    var s = (Mask32*)&i;
    for (; i < count; i++) {
      //var b1 = s->Byte1;
      //var b2 = s->Byte2;
      var b3 = s->Byte3;
      //var b4 = s->Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaStructPointer2() {
    var j = 0;
    int i = 0;
    for (; i < count; i++) {
      var s = *(Mask32*)&i;
      //var b1 = s.Byte1;
      //var b2 = s.Byte2;
      var b3 = s.Byte3;
      //var b4 = s.Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaStructCast() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      Mask32 m = i;
      //var b1 = m.Byte1;
      //var b2 = m.Byte2;
      var b3 = m.Byte3;
      //var b4 = m.Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaUnsafeAs() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      var m = Unsafe.As<int, Mask32>(ref i);
      //var b1 = m.Byte1;
      //var b2 = m.Byte2;
      var b3 = m.Byte3;
      //var b4 = m.Byte4;
      j += b3;
    }
  }

}

The Byte1(), Byte2(), Byte3(), and Byte4() are just the extension methods that do get inlined and simply get the n-th byte by doing bitwise operations and casting:

[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte1(this int it) => (byte)(it >> 24);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte4(this int it) => (byte)it;

EDIT: Fixed the code to make sure variables are actually used. Also commented out 3 of 4 variables to really test struct casting / member access rather than actually using the variables.

The Results

I ran these in the Release build with optimizations on x64.

Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0


            Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
            Direct |  14.47 us | 0.3314 us | 0.2938 us |   1.00 |     0.00 |
  ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us |   7.70 |     0.15 |
 ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us |   7.07 |     0.14 |
     ViaStructCast |  29.00 us | 0.3159 us | 0.2800 us |   2.01 |     0.04 |
       ViaUnsafeAs |  14.32 us | 0.0955 us | 0.0894 us |   0.99 |     0.02 |

EDIT: New results after fixing the code:

            Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
            Direct |  57.51 us | 1.1070 us | 1.0355 us |   1.00 |     0.00 |
  ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us |   3.53 |     0.08 |
 ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us |   3.45 |     0.06 |
     ViaStructCast |  79.68 us | 1.5478 us | 1.7824 us |   1.39 |     0.04 |
       ViaUnsafeAs |  57.01 us | 0.8266 us | 0.6902 us |   0.99 |     0.02 |

Questions

The benchmark results were surprising for me, and that's why I have a few questions:

EDIT: Fewer questions remain after altering the code so that the variables actually get used.

  1. Why is the pointer stuff so slow?
  2. Why is the cast taking twice as long as the baseline case? Aren't implicit/explicit operators inlined?
  3. How come the new System.Runtime.CompilerServices.Unsafe package (v. 4.5.0) is so fast? I thought it would at least involve a method call...
  4. More generally, how can I make essentially a zero-cost struct that would simply act as a "window" onto some memory or a biggish primitive type like UInt64 so that I can more effectively manipulate / read that memory? What's the best practice here?
like image 504
Fit Dev Avatar asked Jun 15 '18 07:06

Fit Dev


People also ask

What is unsafe in C#?

Unsafe code in C# isn't necessarily dangerous; it's just code whose safety cannot be verified. Unsafe code has the following properties: Methods, types, and code blocks can be defined as unsafe. In some cases, unsafe code may increase an application's performance by removing array bounds checks.

Are structs passed by value in C?

A struct can be either passed/returned by value or passed/returned by reference (via a pointer) in C. The general consensus seems to be that the former can be applied to small structs without penalty in most cases.

Can we pass structure by value?

But remember if one of the structure elements happens to be an array, it will automatically be passed by reference as the arrays cannot be passed by value.


1 Answers

When you take the address of a local the jit generally has to keep that local on the stack. That's the case here. In the ViaPointer version i is kept on the stack. In the ViaUnsafe, i is copied to a temp and the temp is kept on the stack. The former is slower because i is also used to control the iteration of the loop.

You can get pretty close to the ViaUnsafe perf with the following code where you explicitly make a copy:

    public static int ViaStructPointer2()
    {
        int total = 0;

        for (int i = 0; i < count; i++)
        {
            int j = i;
            var s = (Mask32*)&j;
            total += s->Byte1;
        }

        return total;
    }

ViaStructPointer  took 00:00:00.1147793
ViaUnsafeAs       took 00:00:00.0282828
ViaStructPointer2 took 00:00:00.0257589
like image 52
Andy Ayers Avatar answered Oct 23 '22 09:10

Andy Ayers