Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Managed to unmanaged overhead

In .NET there are several places when you must leave managed code and enter the realm of unmanaged a.k.a. native code. To name a few:

  • extern dll functions
  • COM invocation

There are always comments about overhead that jump from one side to another causes, and my question here is if anybody MEASURED exact overhead that is happening, and can explain how it can be calculated. For example, maybe byte[] can be converted to IntPtr or even to byte* in .NET and help marshaller save some CPU cycles.

like image 561
Daniel Mošmondor Avatar asked Aug 16 '11 05:08

Daniel Mošmondor


2 Answers

Getting the address of a managed array is indeed possible.

First you have to pin the array using System.Runtime.InteropServices.GCHandle, so that the garbage collector doesn't move the array around. You must keep this handle allocated as long as the unmanaged code has access to the managed array.

byte[] the_array = ... ;
GCHandle pin = GCHandle.Alloc(the_array, GCHandleType.Pinned);

Then, you should be able to use System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement to get an IntPtr to any element in the array.

IntPtr p = Marshal.UnsafeAddrOfPinnedArrayElement(the_array,0);

Important: Pinning objects seriously disrupts GC operation. Being able to move objects around in the heap is one of the reasons why modern GCs can (somewhat) keep up with manual memory management. By pinning objects in the managed heap, the GC looses it's one performance advantage over manual memory management: a relatively unfragmented heap.

So if you plan to keep these arrays "on the unmanaged side" for some time, consider making a copy of the array instead. Copying memory is surprisingly fast. Use the Marshal.Copy(*) methods to copy from managed to unmanaged memory and vice-versa.

like image 165
Christian Klauser Avatar answered Nov 15 '22 22:11

Christian Klauser


[I see I didn't really answer the question about how you would measure; the best way to measure is just with some instrumentation, either with the instrumentation classes (see: http://msdn.microsoft.com/en-us/library/aa645516(v=vs.71).aspx) or even with something as simple as placing in some timers around whatever calls you're interested in. So, in the crudest form, when we were trying to find the performance hit taken, for instance, in our call between C# and ATL COM, we would just place timers around an empty function call we would start a timer, run in a tight loop between C# and the empty ATL COM function, do enough loops that we were able to get reasonably consistent answers between runs, and then do the exact same thing in C++. Then, the difference between those two numbers is the overhead for making the call across that boundary.]

I don't really have any hard numbers, but I can answer from previous experience that as long as you use things in an efficient way, C# performs with very little, if any, overhead beyond what one might expect in C++, depending on the exact nature of what you are trying to do.

I worked on several applications that collected very large amounts ultrasonic data through very high frequency (100MHz-3GHz A/D boards) and by doing certain things, as you suggest, (things like byte[] arrays allocated in managed code and then locked down as pointers and passed as buffers for the data; transferring large amounts of data and processing it for imaging various parts).

Way back when, we communicated with C++ code to VB6, and we would wrap the C++ in ATL Simple COM objects, and pass pointers back and forth when needed for data and imaging. We practiced similar techniques much later with C# in VS.NET 2003. Also, I have written a library for a question here that allows for massive unmanaged data storage that can provide support for very large arrays and array operations, as well as even a lot of LINQ-type functionality! Using array fields instead of massive number of objects . (note: there are some issues with the reference counting that were in the latest version, and I have not yet tracked down.)

Also, I have done some interfacing using ATL COM with the FFTW library in order to perform high-performance DSP to good effect, although that library is not quite ready for primetime, it was the basis of the spike solution I created for the link above, and the spike gave me most of the information I was looking for to complete my much more full-blown unmanaged memory allocator and fast-array processing supporting both externally allocated as well as unmanaged allocations from the unmanaged heap, which will eventually replace the processing that currently exists in the FFTW C# library.

So, the point is, I believe the performance penalty is very overblown, especially with the processing power we have these days. In fact, you can get very good performance if you just take care to avoid some of the pitfalls (like calling lots of little cross-boundary functions instead of passing buffers, or multiple allocations of strings and such) using C# by itself. But when it comes to high-speed processing, C# still can fit the bill for all of the scenarios I've mentioned. Does it take a little forethought, yes, sometimes. But the advantages gained in development speed, maintainability, and understandability, the time I spent figuring out how to get the performance I need has always been far less than the amount of time it would have taken to develop primarily or completely in C++.

My two bits. (Oh, one caveat, I mention ATL COM specifically because the performance hit you took when using MFC was not worth it. As I recall it, it was about two orders of magnitude slower when calling out via an MFC COM Object versus the interface on the ATL one, and did not meet our needs. ATL on the other hand was only a smidge slower than calling the equivalent function directly in C++. Sorry, I do not recall any particular numbers offhand, other than that even with the large amounts of ultrasonic data we were collecting and moving around, we did not find it a bottleneck.)


Oh, I found this: http://msdn.microsoft.com/en-us/library/ms973839.aspx "Performance Tips and Tricks in .NET Applications". I found this quote very interesting:

To speed up transition time, try to make use of P/Invoke when you can. The overhead is as little as 31 instructions plus the cost of marshalling if data marshalling is required, and only 8 otherwise. COM interop is much more expensive, taking upwards of 65 instructions.

Sample section titles: "Make Chunky Calls", "Use For Loops for String Iteration", "Be on the Lookout for Asynchronous IO Opportunities".


Some snippets from the referenced Fast Memory Library:

in MemoryArray.cs

    public MemoryArray(int parElementCount, int parElementSize_bytes)
    {
        Descriptor =
            new MemoryArrayDescriptor
                (
                    Marshal.AllocHGlobal(parElementCount * parElementSize_bytes),
                    parElementSize_bytes,
                    parElementCount
                );
    }

    protected override void OnDispose()
    {
        if (Descriptor.StartPointer != IntPtr.Zero)
            Marshal.FreeHGlobal(Descriptor.StartPointer);

        base.OnDispose();
    }

    // this really should only be used for random access to the items, if you want sequential access
    // use the enumerator which uses pointer math via the array descriptor's TryMoveNext call.
    //
    // i haven't figured out exactly where it would go, but you could also do something like 
    // having a member MemoryArrayItem that gets updated here rather than creating a new one each
    // time; that would break anything that was trying to hold on to a reference to the item because
    // it will no longer be immutable.
    //
    // that could be remedied by something like a call that would return a new copy of the item if it
    // was to be held onto.  i would definitely need to see that i needed the performance boost and
    // that it was significant enough before i would contradict the users expectations on that one.

    public MemoryArrayItem this[int i]
    {
        get
        {
            return new MemoryArrayItem(this, Descriptor.GetElementPointer(i), Descriptor.ElementSize_bytes);
        }
    }

    // you could also do multiple dimension indexing; to do so you would have to pass in dimensions somehow in
    // the constructor and store them.
    //
    // there's all sorts of stuff you could do with this; take various slices, etc, do switching between
    // last-to-first/first-to-last/custom dimension ordering, etc, but i didn't tackle that for the example.
    //
    // if you don't need to error check here then just you could always do something like:
    public MemoryArrayItem this[int x, int y]
    {
        get
        {
            if (myDimensions == null)
                throw new ArrayTypeMismatchException("attempted to index two dimensional array without calling SetDimensions()");

            if (myDimensions.Length != 2)
                throw new ArrayTypeMismatchException("currently set dimensions do not provide a two dimensional array. [dimension: " + myDimensions.Length + "]");

            int RowSize_bytes = myDimensions[0] * Descriptor.ElementSize_bytes;

            return new MemoryArrayItem(this, Descriptor.StartPointer + (y * RowSize_bytes) + x * Descriptor.ElementSize_bytes, Descriptor.ElementSize_bytes);
        }
    }

    public void SetDimensions(int[] parDimensions)
    {
        if (parDimensions.Length <= 0)
            throw new Exception("unable to set array to dimension of zero.");

        for (int i = 0; i < parDimensions.Length; ++i)
            if (parDimensions[i] <= 0)
                throw new ArgumentOutOfRangeException("unable to set dimension at index " + i.ToString() + " to " + parDimensions[i] + ".");

        myDimensions = new int[parDimensions.Length];
        parDimensions.CopyTo(myDimensions, 0);
    }
    private int[] myDimensions = null;

from MemoryArrayEnumerator.cs

public class MemoryArrayEnumerator :
    IEnumerator<MemoryArrayItem>
{
    // handles reference counting for the main array 
    private AutoReference<MemoryArray> myArray;
    private MemoryArray Array { get { return myArray; } }

    private IntPtr myCurrentPosition = IntPtr.Zero;

    public MemoryArrayEnumerator(MemoryArray parArray)
    {
        myArray = AutoReference<MemoryArray>.CreateFromExisting(parArray);
    }

    //---------------------------------------------------------------------------------------------------------------
    #region IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------
    public MemoryArrayItem Current
    {
        get 
        {
            if (Array.Descriptor.CheckPointer(myCurrentPosition))
                return new MemoryArrayItem(myArray, myCurrentPosition, Array.Descriptor.ElementSize_bytes);
            else
                throw new IndexOutOfRangeException("Enumerator Error: Current() was out of range");
        }
    }

    public void Dispose()
    {
        myArray.Dispose();
    }

    object System.Collections.IEnumerator.Current
    {
        get { throw new NotImplementedException(); }
    }

    public bool MoveNext()
    {
        bool RetVal = true;

        if (myCurrentPosition == IntPtr.Zero)
            myCurrentPosition = Array.Descriptor.StartPointer;
        else
            RetVal = Array.Descriptor.TryMoveNext(ref myCurrentPosition);

        return RetVal;
    }

    public void Reset()
    {
        myCurrentPosition = IntPtr.Zero;
    }
    //---------------------------------------------------------------------------------------------------------------
    #endregion IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------
like image 35
shelleybutterfly Avatar answered Nov 15 '22 22:11

shelleybutterfly