Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to marshal C++ structs to C#?

I am about to begin reading tons of binary files, each with 1000 or more records. New files are added constantly so I'm writing a Windows service to monitor the directories and process new files as they are received. The files were created with a c++ program. I've recreated the struct definitions in c# and can read the data fine, but I'm concerned that the way I'm doing it will eventually kill my application.

using (BinaryReader br = new BinaryReader(File.Open("myfile.bin", FileMode.Open)))
{
    long pos = 0L;
    long length = br.BaseStream.Length;

    CPP_STRUCT_DEF record;
    byte[] buffer = new byte[Marshal.SizeOf(typeof(CPP_STRUCT_DEF))];
    GCHandle pin;

    while (pos < length)
    {
        buffer = br.ReadBytes(buffer.Length);
        pin = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        record = (CPP_STRUCT_DEF)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(CPP_STRUCT_DEF));
        pin.Free();

        pos += buffer.Length;

        /* Do stuff with my record */
    }
}

I don't think I need to use GCHandle because I'm not actually communicating with the C++ app, everything is being done from managed code, but I don't know of an alternative method.

like image 667
scottm Avatar asked May 18 '09 14:05

scottm


3 Answers

Using Marshal.PtrToStructure is rather slow. I found the following article on CodeProject which is comparing (and benchmarking) different ways of reading binary data very helpful:

Fast Binary File Reading with C#

like image 77
Dirk Vollmar Avatar answered Oct 15 '22 06:10

Dirk Vollmar


For your particular application, only one thing will give you the definitive answer: Profile it.

That being said here are the lessons I've learned while working with large PInvoke solutions. The most effective way to marshal data is to marshal fields which are blittable. Meaning the CLR can simple do what amounts to a memcpy to move data between native and managed code. In simple terms, get all of the non-inline arrays and strings out of your structures. If they are present in the native structure, represent them with an IntPtr and marshal the values on demand into managed code.

I haven't ever profiled the difference between using Marshal.PtrToStructure vs. having a native API dereference the value. This is probably something you should invest in should PtrToStructure be revealed as a bottleneck via profiling.

For large hierarchies marshal on demand vs. pulling an entire structure into managed code at a single time. I've run into this issue the most when dealing with large tree structures. Marshalling an individual node is very fast if it's blittable and performance wise it works out to only marshal what you need at that moment.

like image 31
JaredPar Avatar answered Oct 15 '22 05:10

JaredPar


In addition to JaredPar's comprehensive answer, you don't need to use GCHandle, you can use unsafe code instead.

fixed(byte *pBuffer = buffer) {
     record = *((CPP_STRUCT_DEF *)pBuffer);
}  

The whole purpose of the GCHandle/fixed statement is to pin/fix the particular memory segment, making the memory immovable from GC's point of view. If the memory was movable, any relocation would render your pointers invalid.

Not sure which way is faster though.

like image 3
arul Avatar answered Oct 15 '22 07:10

arul