<h3>Background</h3> <p>I wanted to make a few integer-sized <code>struct</code>s (i.e. 32 and 64 bits) that are easily convertible to/from primitive unmanaged types of the same size (i.e. <code>Int32</code> and <code>UInt32</code> for 32-bit-sized struct in particular).</p> <p>The structs would then expose additional functionality for bit manipulation / indexing that is not available on integer types directly. Basically, as a sort of syntactic sugar, improving readability and ease of use.</p> <p>The important part, however, was performance, in that there should essentially be 0 cost for this extra abstraction (at the end of the day the CPU should "see" the same bits as if it was dealing with primitive ints).</p> <h3>Sample Struct</h3> <p>Below is just the very basic <code>struct</code> I came up with. It does not have all the functionality, but enough to illustrate my questions:</p> <pre class="prettyprint"><code>[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)] public struct Mask32 { [FieldOffset(3)] public byte Byte1; [FieldOffset(2)] public ushort UShort1; [FieldOffset(2)] public byte Byte2; [FieldOffset(1)] public byte Byte3; [FieldOffset(0)] public ushort UShort2; [FieldOffset(0)] public byte Byte4; [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i; [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i; } </code></pre> <h3>The Test</h3> <p>I wanted to test the performance of this struct. In particular I wanted to see if it could let me <em>get the individual bytes just as quickly if I were to use regular bitwise arithmetic</em>: <code>(i >> 8) & 0xFF</code> (to get the 3rd byte for example).</p> <p>Below you will see a benchmark I came up with:</p> <pre class="prettyprint"><code>public unsafe class MyBenchmark { const int count = 50000; [Benchmark(Baseline = true)] public static void Direct() { var j = 0; for (int i = 0; i < count; i++) { //var b1 = i.Byte1(); //var b2 = i.Byte2(); var b3 = i.Byte3(); //var b4 = i.Byte4(); j += b3; } } [Benchmark] public static void ViaStructPointer() { var j = 0; int i = 0; var s = (Mask32*)&i; for (; i < count; i++) { //var b1 = s->Byte1; //var b2 = s->Byte2; var b3 = s->Byte3; //var b4 = s->Byte4; j += b3; } } [Benchmark] public static void ViaStructPointer2() { var j = 0; int i = 0; for (; i < count; i++) { var s = *(Mask32*)&i; //var b1 = s.Byte1; //var b2 = s.Byte2; var b3 = s.Byte3; //var b4 = s.Byte4; j += b3; } } [Benchmark] public static void ViaStructCast() { var j = 0; for (int i = 0; i < count; i++) { Mask32 m = i; //var b1 = m.Byte1; //var b2 = m.Byte2; var b3 = m.Byte3; //var b4 = m.Byte4; j += b3; } } [Benchmark] public static void ViaUnsafeAs() { var j = 0; for (int i = 0; i < count; i++) { var m = Unsafe.As<int, Mask32>(ref i); //var b1 = m.Byte1; //var b2 = m.Byte2; var b3 = m.Byte3; //var b4 = m.Byte4; j += b3; } } } </code></pre> <p>The <code>Byte1()</code>, <code>Byte2()</code>, <code>Byte3()</code>, and <code>Byte4()</code> are just the extension methods that <em>do get inlined</em> and simply get the n-th byte by doing bitwise operations and casting:</p> <pre class="prettyprint"><code>[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static byte Byte1(this int it) => (byte)(it >> 24); [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF); [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF); [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)] public static byte Byte4(this int it) => (byte)it; </code></pre> <p><strong>EDIT:</strong> Fixed the code to make sure variables are actually used. Also commented out 3 of 4 variables to really test struct casting / member access rather than actually using the variables.</p> <h3>The Results</h3> <p>I ran these in the Release build with optimizations on x64.</p> <pre class="prettyprint"><code>Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC [Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0 DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0 Method | Mean | Error | StdDev | Scaled | ScaledSD | ------------------ |----------:|----------:|----------:|-------:|---------:| Direct | 14.47 us | 0.3314 us | 0.2938 us | 1.00 | 0.00 | ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us | 7.70 | 0.15 | ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us | 7.07 | 0.14 | ViaStructCast | 29.00 us | 0.3159 us | 0.2800 us | 2.01 | 0.04 | ViaUnsafeAs | 14.32 us | 0.0955 us | 0.0894 us | 0.99 | 0.02 | </code></pre> <p><strong>EDIT:</strong> New results after fixing the code:</p> <pre class="prettyprint"><code> Method | Mean | Error | StdDev | Scaled | ScaledSD | ------------------ |----------:|----------:|----------:|-------:|---------:| Direct | 57.51 us | 1.1070 us | 1.0355 us | 1.00 | 0.00 | ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us | 3.53 | 0.08 | ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us | 3.45 | 0.06 | ViaStructCast | 79.68 us | 1.5478 us | 1.7824 us | 1.39 | 0.04 | ViaUnsafeAs | 57.01 us | 0.8266 us | 0.6902 us | 0.99 | 0.02 | </code></pre> <h3>Questions</h3> <p>The benchmark results were surprising for me, and that's why I have a few questions:</p> <p><strong>EDIT:</strong> Fewer questions remain after altering the code so that the variables actually get used.</p> <ol> <li>Why is the pointer stuff <del>so</del> slow?</li> <li><del>Why is the cast taking twice as long as the baseline case? Aren't implicit/explicit operators inlined?</del></li> <li>How come the new <code>System.Runtime.CompilerServices.Unsafe</code> package (v. 4.5.0) is so fast? I thought it would at least involve a method call...</li> <li>More generally, how can I make essentially a zero-cost struct that would simply <em>act as a "window" onto some memory</em> or a biggish primitive type like <code>UInt64</code> so that I can more effectively manipulate / read that memory? What's the best practice here?</li> </ol>

<p>When you take the address of a local the jit generally has to keep that local on the stack. That's the case here. In the <code>ViaPointer</code> version <code>i</code> is kept on the stack. In the <code>ViaUnsafe</code>, <code>i</code> is copied to a temp and the temp is kept on the stack. The former is slower because <code>i</code> is also used to control the iteration of the loop.</p> <p>You can get pretty close to the <code>ViaUnsafe</code> perf with the following code where you explicitly make a copy:</p> <pre class="prettyprint"><code> public static int ViaStructPointer2() { int total = 0; for (int i = 0; i < count; i++) { int j = i; var s = (Mask32*)&j; total += s->Byte1; } return total; } ViaStructPointer took 00:00:00.1147793 ViaUnsafeAs took 00:00:00.0282828 ViaStructPointer2 took 00:00:00.0257589 </code></pre>

Why is casting a struct via Pointer slow, while Unsafe.As is fast?

Background

I wanted to make a few integer-sized structs (i.e. 32 and 64 bits) that are easily convertible to/from primitive unmanaged types of the same size (i.e. Int32 and UInt32 for 32-bit-sized struct in particular).

The structs would then expose additional functionality for bit manipulation / indexing that is not available on integer types directly. Basically, as a sort of syntactic sugar, improving readability and ease of use.

The important part, however, was performance, in that there should essentially be 0 cost for this extra abstraction (at the end of the day the CPU should "see" the same bits as if it was dealing with primitive ints).

Sample Struct

Below is just the very basic struct I came up with. It does not have all the functionality, but enough to illustrate my questions:

[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32 {
  [FieldOffset(3)]
  public byte Byte1;
  [FieldOffset(2)]
  public ushort UShort1;
  [FieldOffset(2)]
  public byte Byte2;
  [FieldOffset(1)]
  public byte Byte3;
  [FieldOffset(0)]
  public ushort UShort2;
  [FieldOffset(0)]
  public byte Byte4;

  [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
  public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i;
  [DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
  public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i;
}

The Test

I wanted to test the performance of this struct. In particular I wanted to see if it could let me get the individual bytes just as quickly if I were to use regular bitwise arithmetic: (i >> 8) & 0xFF (to get the 3rd byte for example).

Below you will see a benchmark I came up with:

public unsafe class MyBenchmark {

  const int count = 50000;

  [Benchmark(Baseline = true)]
  public static void Direct() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      //var b1 = i.Byte1();
      //var b2 = i.Byte2();
      var b3 = i.Byte3();
      //var b4 = i.Byte4();
      j += b3;
    }
  }


  [Benchmark]
  public static void ViaStructPointer() {
    var j = 0;
    int i = 0;
    var s = (Mask32*)&i;
    for (; i < count; i++) {
      //var b1 = s->Byte1;
      //var b2 = s->Byte2;
      var b3 = s->Byte3;
      //var b4 = s->Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaStructPointer2() {
    var j = 0;
    int i = 0;
    for (; i < count; i++) {
      var s = *(Mask32*)&i;
      //var b1 = s.Byte1;
      //var b2 = s.Byte2;
      var b3 = s.Byte3;
      //var b4 = s.Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaStructCast() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      Mask32 m = i;
      //var b1 = m.Byte1;
      //var b2 = m.Byte2;
      var b3 = m.Byte3;
      //var b4 = m.Byte4;
      j += b3;
    }
  }

  [Benchmark]
  public static void ViaUnsafeAs() {
    var j = 0;
    for (int i = 0; i < count; i++) {
      var m = Unsafe.As<int, Mask32>(ref i);
      //var b1 = m.Byte1;
      //var b2 = m.Byte2;
      var b3 = m.Byte3;
      //var b4 = m.Byte4;
      j += b3;
    }
  }

}

The Byte1(), Byte2(), Byte3(), and Byte4() are just the extension methods that do get inlined and simply get the n-th byte by doing bitwise operations and casting:

[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte1(this int it) => (byte)(it >> 24);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte4(this int it) => (byte)it;

EDIT: Fixed the code to make sure variables are actually used. Also commented out 3 of 4 variables to really test struct casting / member access rather than actually using the variables.

The Results

I ran these in the Release build with optimizations on x64.

Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0


            Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
            Direct |  14.47 us | 0.3314 us | 0.2938 us |   1.00 |     0.00 |
  ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us |   7.70 |     0.15 |
 ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us |   7.07 |     0.14 |
     ViaStructCast |  29.00 us | 0.3159 us | 0.2800 us |   2.01 |     0.04 |
       ViaUnsafeAs |  14.32 us | 0.0955 us | 0.0894 us |   0.99 |     0.02 |

EDIT: New results after fixing the code:

            Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
            Direct |  57.51 us | 1.1070 us | 1.0355 us |   1.00 |     0.00 |
  ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us |   3.53 |     0.08 |
 ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us |   3.45 |     0.06 |
     ViaStructCast |  79.68 us | 1.5478 us | 1.7824 us |   1.39 |     0.04 |
       ViaUnsafeAs |  57.01 us | 0.8266 us | 0.6902 us |   0.99 |     0.02 |

Questions

The benchmark results were surprising for me, and that's why I have a few questions:

EDIT: Fewer questions remain after altering the code so that the variables actually get used.

Why is the pointer stuff so slow?
~~Why is the cast taking twice as long as the baseline case? Aren't implicit/explicit operators inlined?~~
How come the new System.Runtime.CompilerServices.Unsafe package (v. 4.5.0) is so fast? I thought it would at least involve a method call...
More generally, how can I make essentially a zero-cost struct that would simply act as a "window" onto some memory or a biggish primitive type like UInt64 so that I can more effectively manipulate / read that memory? What's the best practice here?

504

asked Jun 15 '18 07:06

Fit Dev

1 Answers

When you take the address of a local the jit generally has to keep that local on the stack. That's the case here. In the ViaPointer version i is kept on the stack. In the ViaUnsafe, i is copied to a temp and the temp is kept on the stack. The former is slower because i is also used to control the iteration of the loop.

You can get pretty close to the ViaUnsafe perf with the following code where you explicitly make a copy:

    public static int ViaStructPointer2()
    {
        int total = 0;

        for (int i = 0; i < count; i++)
        {
            int j = i;
            var s = (Mask32*)&j;
            total += s->Byte1;
        }

        return total;
    }

ViaStructPointer  took 00:00:00.1147793
ViaUnsafeAs       took 00:00:00.0282828
ViaStructPointer2 took 00:00:00.0257589

answered Oct 23 '22 09:10

Andy Ayers

Related questions
                            
                                How to use AfterMap to map properties on collection property
                            
                                SMTP 5.7.57 error when trying to send email via Office 365
                            
                                C#: The console is outputting infinite (∞) [closed]
                            
                                C# Time complexity of Array[T].Contains(T item) vs HashSet<T>.Contains(T item)
                            
                                Run Individual Test from Nunit3-console.exe
                            
                                Running background tasks periodically in an ASP.NET Core RC2 application
                            
                                Copying Content files on build with Visual Studio Code
                            
                                Use Redirect in Web Api Controller (HTTP 302 Found)
                            
                                How to read FormData into WebAPI
                            
                                CngKey.Import on azure
                            
                                Assign value directly to class variable
                            
                                I want to create xlsx (Excel) file from c#
                            
                                ASP.NET Core WebApi HttpResponseMessage create custom message?
                            
                                Quartz.Net Dependency Injection .Net Core
                            
                                HttpContext.Current.Request.Form.AllKeys in ASP.NET CORE version
                            
                                how get value on expando object #
                            
                                Cannot install Microsoft Power Bi Postgre SQL connector
                            
                                How Do I Call XML SOAP Service that Requires Signature from .Net Core?
                            
                                ActionFilter Response.StatusCode is always 200
                            
                                Multiple Dtos for same entity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is casting a struct via Pointer slow, while Unsafe.As is fast?

Tags:

performance

c#

struct

unsafe

c#-7.2