For my C++/CLI project I just tried to measure the cost of C++/CLI function pointers versus .NET delegates.
My expectation was, that C++/CLI function pointers are faster than .NET delegates. So my test separately counts the number of invocations of the .NET delegate and native function pointer throughout 5 seconds.
Now the results were (and still are) shocking to me:
That means, the native C++/CLI function pointer usage is almost 3x slower than using a managed delegate from within C++/CLI code. How can that be? I should use managed constructs when it comes to using interfaces, delegates or abstract classes in performance-critical sections?
The function which gets called continuously:
__int64 DoIt(int n, __int64 sum)
{
if ((n % 3) == 0)
return sum + n;
else
return sum + 1;
}
The code, which invokes the method, tries to make use of all the parameters as well as the return value, so nothing gets optimized away (hopefully). Here's the code (for .NET delegates):
__int64 executions;
__int64 result;
System::Diagnostics::Stopwatch^ w = gcnew System::Diagnostics::Stopwatch();
System::Func<int, __int64, __int64>^ managedPtr = gcnew System::Func<int, __int64, __int64>(&DoIt);
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
for (int i=0; i < 1000000; i++)
result += managedPtr(i, executions);
executions++;
}
System::Console::WriteLine(".NET delegate: {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);
Similar to the .NET delegate invocation, the C++ function pointer is used:
typedef __int64 (* DoItMethod)(int n, __int64 sum);
DoItMethod nativePtr = DoIt;
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
for (int i=0; i < 1000000; i++)
result += nativePtr(i, executions);
executions++;
}
System::Console::WriteLine("Function pointer: {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);
All tests done:
The direct call to "DoIt" is represented here by "Function call", which seems to get inlined by the compiler, as there is no (significant) difference in execution counts compared to a call to the inlined function.
Calls to C++ virtual methods are as 'slow' as the function pointer. A virtual method of a managed class (ref class) is as fast as the .NET delegate.
Update: I digged a little deeper, and it seems that for the tests with unmanaged functions, the transition to native code happens each time the DoIt function gets called. Therefore I wrapped the inner loop into another function which I forced to compile unmanaged:
#pragma managed(push, off)
__int64 TestCall(__int64* executions)
{
__int64 result = 0;
for (int i=0; i < 1000000; i++)
result += DoItNative(i, *executions);
(*executions)++;
return result;
}
#pragma managed(pop)
Additionally I tested std::function like that:
#pragma managed(push, off)
__int64 TestStdFunc(__int64* executions)
{
__int64 result = 0;
std::function<__int64(int, __int64)> func(DoItNative);
for (int i=0; i < 1000000; i++)
result += func(i, *executions);
(*executions)++;
return result;
}
#pragma managed(pop)
Now, the new results are:
std::function is a bit disappointing.
Using a function pointer is slower that just calling a function as it is another layer of indirection. (The pointer needs to be dereferenced to get the memory address of the function). While it is slower, compared to everything else your program may do (Read a file, write to the console) it is negligible.
1) Unlike normal pointers, a function pointer points to code, not data. Typically a function pointer stores the start of executable code. 2) Unlike normal pointers, we do not allocate de-allocate memory using function pointers. 3) A function's name can also be used to get functions' address.
Delegates in C# are similar to function pointers in C++, but C# delegates are type safe. You can pass methods as parameters to a delegate to allow the delegate to point to the method. Delegates are used to define callback methods and implement event handling, and they are declared using the “delegate” keyword.
Function pointers can be useful when you want to create callback mechanism, and need to pass address of a function to another function. They can also be useful when you want to store an array of functions, to call dynamically for example.
You are seeing the cost of "double thunking". The core problem with your DoIt() function is that it is being compiled as managed code. The delegate call is very fast, it is uncomplicated to go from managed to managed code through a delegate. The function pointer is slow however, the compiler automatically generates code to first switch from managed code to unmanaged code and make the call through the function pointer. Which then ends up in a stub that switches from unmanaged code back to managed code and calls DoIt().
Presumably what you really meant to measure was a call to native code. Use a #pragma to force DoIt() to be generated as machine code, like this:
#pragma managed(push, off)
__int64 DoIt(int n, __int64 sum)
{
if ((n % 3) == 0)
return sum + n;
else
return sum + 1;
}
#pragma managed(pop)
You'll now see that the function pointer is faster than a delegate
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With