Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Calli Faster Than a Delegate Call?

I was playing around with Reflection.Emit and found about about the little-used EmitCalli. Intrigued, I wondered if it's any different from a regular method call, so I whipped up the code below:

using System; using System.Diagnostics; using System.Reflection.Emit; using System.Runtime.InteropServices; using System.Security;  [SuppressUnmanagedCodeSecurity] static class Program {     const long COUNT = 1 << 22;     static readonly byte[] multiply = IntPtr.Size == sizeof(int) ?       new byte[] { 0x8B, 0x44, 0x24, 0x04, 0x0F, 0xAF, 0x44, 0x24, 0x08, 0xC3 }     : new byte[] { 0x0f, 0xaf, 0xca, 0x8b, 0xc1, 0xc3 };      static void Main()     {         var handle = GCHandle.Alloc(multiply, GCHandleType.Pinned);         try         {             //Make the native method executable             uint old;             VirtualProtect(handle.AddrOfPinnedObject(),                 (IntPtr)multiply.Length, 0x40, out old);             var mulDelegate = (BinaryOp)Marshal.GetDelegateForFunctionPointer(                 handle.AddrOfPinnedObject(), typeof(BinaryOp));              var T = typeof(uint); //To avoid redundant typing              //Generate the method             var method = new DynamicMethod("Mul", T,                 new Type[] { T, T }, T.Module);             var gen = method.GetILGenerator();             gen.Emit(OpCodes.Ldarg_0);             gen.Emit(OpCodes.Ldarg_1);             gen.Emit(OpCodes.Ldc_I8, (long)handle.AddrOfPinnedObject());             gen.Emit(OpCodes.Conv_I);             gen.EmitCalli(OpCodes.Calli, CallingConvention.StdCall,                 T, new Type[] { T, T });             gen.Emit(OpCodes.Ret);              var mulCalli = (BinaryOp)method.CreateDelegate(typeof(BinaryOp));              var sw = Stopwatch.StartNew();             for (int i = 0; i < COUNT; i++) { mulDelegate(2, 3); }             Console.WriteLine("Delegate: {0:N0}", sw.ElapsedMilliseconds);             sw.Reset();              sw.Start();             for (int i = 0; i < COUNT; i++) { mulCalli(2, 3); }             Console.WriteLine("Calli:    {0:N0}", sw.ElapsedMilliseconds);         }         finally { handle.Free(); }     }      delegate uint BinaryOp(uint a, uint b);      [DllImport("kernel32.dll", SetLastError = true)]     static extern bool VirtualProtect(         IntPtr address, IntPtr size, uint protect, out uint oldProtect); } 

I ran the code in x86 mode and x64 mode. The results?

32-bit:

  • Delegate version: 994
  • Calli version: 46

64-bit:

  • Delegate version: 326
  • Calli version: 83

I guess the question's obvious by now... why is there such a huge speed difference?


Update:

I created a 64-bit P/Invoke version as well:

  • Delegate version: 284
  • Calli version: 77
  • P/Invoke version: 31

Apparently, P/Invoke is faster... is this a problem with my benchmarking, or is there something going on I don't understand? (I'm in release mode, by the way.)

like image 396
user541686 Avatar asked May 05 '11 05:05

user541686


2 Answers

Given your performance numbers, I assume you must be using the 2.0 framework, or something similar? The numbers are much better in 4.0, but the "Marshal.GetDelegate" version is still slower.

The thing is that not all delegates are created equal.

Delegates for managed code functions are essentially just a straight function call (on x86, that's a __fastcall), with the addition of a little "switcheroo" if you're calling a static function (but that's just 3 or 4 instructions on x86).

Delegates created by "Marshal.GetDelegateForFunctionPointer", on the other hand - are a straight function call into a "stub" function, which does a little overhead (marshalling and whatnot) before calling the unmanaged function. In this case there's very little marshalling, and the marshalling for this call appears to be pretty much optimized out in 4.0 (but most likely still goes through the ML interpreter on 2.0) - but even in 4.0, there's a stackWalk demanding unmanaged code permissions that isn't part of your calli delegate.

I've generally found that, short of knowing someone on the .NET dev team, your best bet on figuring out what's going on w/ managed/unmanaged interop is to do a little digging with WinDbg and SOS.

like image 95
Kevin Avatar answered Oct 23 '22 03:10

Kevin


Difficult to answer :) Anyway I will try.

The EmitCalli is faster because it is a raw byte code call. I suspect the SuppressUnmanagedCodeSecurity will also disable some checks, for instance stack overrun/array out of bounds index checks. So the code is not safe and run at full speed.

The delegate version will have some compiled code to check typing, and will also do a de-reference call (because the delegate is like a typed-function pointer).

My two cents!

like image 28
daitangio Avatar answered Oct 23 '22 03:10

daitangio