Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Func vs custom delegate performance

I am working on some very performance critical code and has discovered that calling an anonymous method using a delegate performs worse than calling the same code through a Func delegate.

public class DelegateTests
{
    public delegate int GetValueDelegate(string test);

    private Func<string, int> getValueFunc;

    private GetValueDelegate getValueDelegate;

    public DelegateTests()
    {
        getValueDelegate = (s) => 42;
        getValueFunc = (s) => 42;                        
    }

    [Benchmark]
    public int CallWithDelegate()
    {
        return getValueDelegate.Invoke("TEST");
    }

    [Benchmark]
    public int CallWithFunc()
    {
        return getValueFunc.Invoke("TEST");
    }
}

BenchmarkDotNet gives:

// * Summary *

BenchmarkDotNet=v0.10.4, OS=Windows 10.0.14393
Processor=Intel Core i7-4770HQ CPU 2.20GHz (Haswell), ProcessorCount=2
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]    : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0
  RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0

Job=RyuJitX64  Jit=RyuJit  Platform=X64

           Method |      Mean |     Error |    StdDev |
----------------- |----------:|----------:|----------:|
 CallWithDelegate | 0.9926 ns | 0.0559 ns | 0.0783 ns |
     CallWithFunc | 0.8763 ns | 0.0168 ns | 0.0131 ns |

// * Hints *
Outliers
  DelegateTests.CallWithFunc: RyuJitX64 -&gt; 3 outliers were removed

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements

// ***** BenchmarkRunner: End *****

As we can see, calling the function using a Func delegate is faster than invoking the function using the GetValueDelegate. I'm trying to find evidence as to why it behaves this way. Looking at the JIT optimized machine code

    26:             return getValueDelegate.Invoke(&quot;TEST&quot;);
00E105C0 8B 49 08             mov         ecx,dword ptr [ecx+8]  
00E105C3 8B 15 C4 22 71 03    mov         edx,dword ptr ds:[37122C4h]  
00E105C9 8B 41 0C             mov         eax,dword ptr [ecx+0Ch]  
00E105CC 8B 49 04             mov         ecx,dword ptr [ecx+4]  
00E105CF FF D0                call        eax  
00E105D1 C3                   ret 

compared to the

    32:             return getValueFunc.Invoke(&quot;TEST&quot;);
00E10608 8B 49 04             mov         ecx,dword ptr [ecx+4]  
00E1060B 8B 15 C4 22 71 03    mov         edx,dword ptr ds:[37122C4h]  
00E10611 8B 41 0C             mov         eax,dword ptr [ecx+0Ch]  
00E10614 8B 49 04             mov         ecx,dword ptr [ecx+4]  
00E10617 FF D0                call        eax  
00E10619 C3                   ret 

They look pretty much alike. I'm starting to think that it could be a difference inside the Invoke method for the two delegates. They both derive from MulticastDelegate which is a requirement for all delegates on the CLR. Why is the one faster than the other?

UPDATE

Here are the numbers using LegacyJitx86. Please note that I am just interested in WHY there is difference. BTW, swapping the sequence or variable order does not affect the result

// * Summary *

BenchmarkDotNet=v0.10.4, OS=Windows 10.0.14393
Processor=Intel Core i7-4770HQ CPU 2.20GHz (Haswell), ProcessorCount=2
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]       : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0
  LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0

Job=LegacyJitX86  Jit=LegacyJit  Platform=X86
Runtime=Clr

           Method |      Mean |     Error |    StdDev |
----------------- |----------:|----------:|----------:|
 CallWithDelegate | 2.3385 ns | 0.0361 ns | 0.0320 ns |
     CallWithFunc | 2.0144 ns | 0.0410 ns | 0.0384 ns |

// * Hints *
Outliers
  DelegateTests.CallWithDelegate: LegacyJitX86 -&gt; 1 outlier  was  removed

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements

// ***** BenchmarkRunner: End *****
like image 588
seesharper Avatar asked May 03 '17 20:05

seesharper


Video Answer


1 Answers

Running them with current versions of everything on my machine, I find no consistent winner.

Run #1

In this run, CallWithFunc's mean time plus error was 99% of CallWithDelegate's mean time minus error, so I felt like there might be some remnant of the old behavior...


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6850K CPU 3.60GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|           Method |     Mean |     Error |    StdDev |
|----------------- |---------:|----------:|----------:|
| CallWithDelegate | 1.103 ns | 0.0024 ns | 0.0019 ns |
|     CallWithFunc | 1.090 ns | 0.0050 ns | 0.0044 ns |

Run #2

But then when I simply ran it again, the winner actually flipped around, so my guess is that if there is a factor that favors one or the other, then it's probably something super specific.

e.g., maybe the faster callback happens to live on the same cache line as some other important thing, and the slower callback might be the only thing keeping the callback in CPU cache (in which case, changing the order of the fields and marking the class with [StructLayout(LayoutKind.Sequential)] might reveal something).


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6850K CPU 3.60GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

|           Method |     Mean |     Error |    StdDev |
|----------------- |---------:|----------:|----------:|
| CallWithDelegate | 1.062 ns | 0.0036 ns | 0.0030 ns |
|     CallWithFunc | 1.094 ns | 0.0039 ns | 0.0034 ns |
like image 118
Joe Amenta Avatar answered Oct 13 '22 01:10

Joe Amenta