Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Activating a Struct Without Storing It as a Local Variable Expected to Be Slower than Not Storing It as a Local Variable?

I have encountered a performance issue in .NET Core 2.1 that I am trying to understand. The code for this can be found here:

https://github.com/mike-eee/StructureActivation

Here is the relavant benchmark code via BenchmarkDotNet:

public class Program
{
    static void Main()
    {
        BenchmarkRunner.Run<Program>();
    }

    [Benchmark(Baseline = true)]
    public uint? Activated() => new Structure(100).SomeValue;

    [Benchmark]
    public uint? ActivatedAssignment()
    {
        var selection = new Structure(100);
        return selection.SomeValue;
    }
}

public readonly struct Structure
{
    public Structure(uint? someValue) => SomeValue = someValue;

    public uint? SomeValue { get; }
}

From the outset, I would expect Activated to be faster as it does not store a local variable, which I have always understood to incur a performance penalty to locate and reserve the space within the current stack context to do so.

However, when running the tests, I get the following results:

// * Summary *

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.285 (1803/April2018Update/Redstone4)
Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.402
  [Host]     : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT


              Method |     Mean |     Error |    StdDev | Scaled |
-------------------- |---------:|----------:|----------:|-------:|
           Activated | 4.700 ns | 0.0128 ns | 0.0107 ns |   1.00 |
 ActivatedAssignment | 3.331 ns | 0.0278 ns | 0.0260 ns |   0.71 |

The activated structure (without storing a local variable) is roughly 30% slower.

For reference, here is the IL courtesy of ReSharper's IL Viewer:

.method /*06000002*/ public hidebysig instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> 
  Activated() cil managed 
{
  .custom /*0C00000C*/ instance void [BenchmarkDotNet/*23000002*/]BenchmarkDotNet.Attributes.BenchmarkAttribute/*0100000D*/::.ctor() 
    = (01 00 01 00 54 02 08 42 61 73 65 6c 69 6e 65 01 ) // ....T..Baseline.
    // property bool 'Baseline' = bool(true)
  .maxstack 1
  .locals /*11000001*/ init (
    [0] valuetype StructureActivation.Structure/*02000003*/ V_0
  )

  // [14 31 - 14 59]
  IL_0000: ldc.i4.s     100 // 0x64
  IL_0002: newobj       instance void valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>/*1B000001*/::.ctor(!0/*unsigned int32*/)/*0A00000F*/
  IL_0007: newobj       instance void StructureActivation.Structure/*02000003*/::.ctor(valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>)/*06000005*/
  IL_000c: stloc.0      // V_0
  IL_000d: ldloca.s     V_0
  IL_000f: call         instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> StructureActivation.Structure/*02000003*/::get_SomeValue()/*06000006*/
  IL_0014: ret          

} // end of method Program::Activated

.method /*06000003*/ public hidebysig instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> 
  ActivatedAssignment() cil managed 
{
  .custom /*0C00000D*/ instance void [BenchmarkDotNet/*23000002*/]BenchmarkDotNet.Attributes.BenchmarkAttribute/*0100000D*/::.ctor() 
    = (01 00 00 00 )
  .maxstack 2
  .locals /*11000001*/ init (
    [0] valuetype StructureActivation.Structure/*02000003*/ selection
  )

  // [19 4 - 19 39]
  IL_0000: ldloca.s     selection
  IL_0002: ldc.i4.s     100 // 0x64
  IL_0004: newobj       instance void valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>/*1B000001*/::.ctor(!0/*unsigned int32*/)/*0A00000F*/
  IL_0009: call         instance void StructureActivation.Structure/*02000003*/::.ctor(valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>)/*06000005*/

  // [20 4 - 20 31]
  IL_000e: ldloca.s     selection
  IL_0010: call         instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> StructureActivation.Structure/*02000003*/::get_SomeValue()/*06000006*/
  IL_0015: ret          

} // end of method Program::ActivatedAssignment

Upon inspection, Activated has two newobj whereas ActivatedAssignment only has one, which might be contributing to the difference between the two benchmarks.

My question is: is this expected? I am trying to understand why the benchmark with less code is actually slower than the one with more code. Any guidance/recommendations to ensure that I am following best practices would be greatly appreciated.

like image 917
Mike-E Avatar asked Sep 29 '18 05:09

Mike-E


1 Answers

It's a bit more clear what's happening if you look at the JITted assembly from your methods:

Program.Activated()
L0000: sub rsp, 0x18
L0004: xor eax, eax              // Initialize Structure to {0}
L0006: mov [rsp+0x10], rax       // Store to stack
L000b: mov eax, 0x64             // Load literal 100
L0010: mov edx, 0x1              // Load literal 1
L0015: xor ecx, ecx              // Initialize SomeValue to {0}
L0017: mov [rsp+0x8], rcx        // Store to stack
L001c: lea rcx, [rsp+0x8]        // Load pointer to SomeValue from stack
L0021: mov [rcx], dl             // Set SomeValue.HasValue to 1
L0023: mov [rcx+0x4], eax        // Set SomeValue.Value to 100
L0026: mov rax, [rsp+0x8]        // Load SomeValue's value from stack
L002b: mov [rsp+0x10], rax       // Store it to a different location on stack
L0030: mov rax, [rsp+0x10]       // Return it from that location
L0035: add rsp, 0x18
L0039: ret

Program.ActivatedAssignment()
L0000: push rax
L0001: xor eax, eax              // Initialize SomeValue to {0}
L0003: mov [rsp], rax            // Store to stack
L0007: mov eax, 0x64             // Load literal 100
L000c: mov edx, 0x1              // Load literal 1
L0011: lea rcx, [rsp]            // Load pointer to SomeValue from stack
L0015: mov [rcx], dl             // Set SomeValue.HasValue to 1
L0017: mov [rcx+0x4], eax        // Set SomeValue.Value to 100
L001a: mov rax, [rsp]            // Return SomeValue
L001e: add rsp, 0x8
L0022: ret

Obviously, Activated() is doing more work, and that's why it's slower. What it boils down to is a lot of stack shuffling (all references to rsp). I've commented them as best I could, but the Activated() method is a bit convoluted because of the redundant movs. ActivatedAssigment() is much more straightforward.

Ultimately, you're not actually saving stack space by omitting the local variable. The variable has to exist at some point whether you give it a name or not. The IL code you pasted shows a local variable (they call it V_0) which is the temp created by the C# compiler since you didn't create it explicitly.

Where the two differ is that the version with the temp variable only reserves a single stack slot (.maxstack 1), and it uses it for both the Nullable<T> and the Structure, hence the shuffling. In the version with the named variable, it reserves 2 slots (.maxstack 2).

Ironically, in the version with the pre-reserved local variable for selection, the JIT is able to eliminate the outer structure and deal only with its embedded Nullable<T>, making for cleaner/faster code.

I'm not sure you can deduce any best practices from this example, but I think it's easy enough to see that the C# compiler is the source of the perf difference. The JIT is smart enough to do the right thing with your struct but only if it looks a certain way coming in.

like image 130
saucecontrol Avatar answered Oct 20 '22 17:10

saucecontrol