Performance of C++/CLI function pointers versus .NET delegates

Tags:

For my C++/CLI project I just tried to measure the cost of C++/CLI function pointers versus .NET delegates.

My expectation was, that C++/CLI function pointers are faster than .NET delegates. So my test separately counts the number of invocations of the .NET delegate and native function pointer throughout 5 seconds.

Results

Now the results were (and still are) shocking to me:

.NET delegate: 910M executions with result 152080413333030 in 5003ms
Function pointer: 347M executions with result 57893422166551 in 5013ms

That means, the native C++/CLI function pointer usage is almost 3x slower than using a managed delegate from within C++/CLI code. How can that be? I should use managed constructs when it comes to using interfaces, delegates or abstract classes in performance-critical sections?

The test code

The function which gets called continuously:

__int64 DoIt(int n, __int64 sum)
{
    if ((n % 3) == 0)
        return sum + n;
    else
        return sum + 1;
}

The code, which invokes the method, tries to make use of all the parameters as well as the return value, so nothing gets optimized away (hopefully). Here's the code (for .NET delegates):

__int64 executions;
__int64 result;
System::Diagnostics::Stopwatch^ w = gcnew System::Diagnostics::Stopwatch();

System::Func<int, __int64, __int64>^ managedPtr = gcnew System::Func<int, __int64, __int64>(&DoIt);
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
    for (int i=0; i < 1000000; i++)
        result += managedPtr(i, executions);
    executions++;
}
System::Console::WriteLine(".NET delegate:       {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);

Similar to the .NET delegate invocation, the C++ function pointer is used:

typedef __int64 (* DoItMethod)(int n, __int64 sum);

DoItMethod nativePtr = DoIt;
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
    for (int i=0; i < 1000000; i++)
        result += nativePtr(i, executions);
    executions++;
}
System::Console::WriteLine("Function pointer:    {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);

Additional infos

Compiled with Visual Studio 2012
.NET Framework 4.5 was targeted
Release build (execution counts stay proportional for Debug builds)
Calling convention is __stdcall (__fastcall not allowed when the project gets compiled with CLR support)

All tests done:

.NET virtual method: 1025M executions with result 171358304166325 in 5004ms
.NET delegate: 910M executions with result 152080413333030 in 5003ms
Virtual method: 336M executions with result 56056335999888 in 5006ms
Function pointer: 347M executions with result 57893422166551 in 5013ms
Function call: 1459M executions with result 244230520832847 in 5001ms
Inlined function: 1385M executions with result 231791984166205 in 5000ms

The direct call to "DoIt" is represented here by "Function call", which seems to get inlined by the compiler, as there is no (significant) difference in execution counts compared to a call to the inlined function.

Calls to C++ virtual methods are as 'slow' as the function pointer. A virtual method of a managed class (ref class) is as fast as the .NET delegate.

Update: I digged a little deeper, and it seems that for the tests with unmanaged functions, the transition to native code happens each time the DoIt function gets called. Therefore I wrapped the inner loop into another function which I forced to compile unmanaged:

#pragma managed(push, off)
__int64 TestCall(__int64* executions)
{
    __int64 result = 0;
    for (int i=0; i < 1000000; i++)
            result += DoItNative(i, *executions);
    (*executions)++;
    return result;
}
#pragma managed(pop)

Additionally I tested std::function like that:

#pragma managed(push, off)
__int64 TestStdFunc(__int64* executions)
{
    __int64 result = 0;
    std::function<__int64(int, __int64)> func(DoItNative);
    for (int i=0; i < 1000000; i++)
        result += func(i, *executions);
    (*executions)++;
    return result;
}
#pragma managed(pop)

Now, the new results are:

Function call: 2946M executions with result 495340439997054 in 5000ms
std::function: 160M executions with result 26679519999840 in 5018ms

std::function is a bit disappointing.

505

asked Nov 18 '12 18:11

uebe

1 Answers

You are seeing the cost of "double thunking". The core problem with your DoIt() function is that it is being compiled as managed code. The delegate call is very fast, it is uncomplicated to go from managed to managed code through a delegate. The function pointer is slow however, the compiler automatically generates code to first switch from managed code to unmanaged code and make the call through the function pointer. Which then ends up in a stub that switches from unmanaged code back to managed code and calls DoIt().

Presumably what you really meant to measure was a call to native code. Use a #pragma to force DoIt() to be generated as machine code, like this:

#pragma managed(push, off)
__int64 DoIt(int n, __int64 sum)
{
    if ((n % 3) == 0)
        return sum + n;
    else
        return sum + 1;
}
#pragma managed(pop)

You'll now see that the function pointer is faster than a delegate

answered Oct 06 '22 00:10

Hans Passant

Related questions
                            
                                Serializable and DataContract (not versus?)
                            
                                Using SqlBulkCopy, how do I insert data into a table in a non-default database schema?
                            
                                Avoid "Nullable object must have a value." in Linq-To-Sql
                            
                                How to Hide a member method by an Extension Method
                            
                                Should Interfaces Live In The Same Namespace As The Concrete Classes That Implement Them? [closed]
                            
                                .NET: How does the EventHandler race-condition fix work?
                            
                                process.start() embedded exe without extracting to file first c#
                            
                                Simplest way to make cross-appdomain call?
                            
                                How to catch unmanaged C++ exception in managed C++
                            
                                How to implement search functionality in C#/ASP.NET MVC
                            
                                Validating XML on XSD with the error line numbers
                            
                                Including JavaScript at bottom of page, from Partial Views
                            
                                Rolling back to previous version in Fluent Migrator
                            
                                Convert Type to Generic Class Definition
                            
                                MVC .Net Cascade Deleting when using EF Code First Approach
                            
                                Processing audio "on-fly" (C#, WP7)
                            
                                XNA 4.0 with C# .NET 4.5?
                            
                                Can not use ClickOnce publish on .NET 4.0 application from Visual Studio 2012
                            
                                Transaction deadlocks, how to design properly?
                            
                                What is the effect of "Suppress JIT optimization on module load" debugging option?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of C++/CLI function pointers versus .NET delegates

Tags:

performance

.net

delegates

c++-cli

mixed-mode