Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast read C structure when it contains char array

I have the following C structure

struct MyStruct {
    char chArray[96];
    __int64 offset;
    unsigned count;
}

I now have a bunch of files created in C with thousands of those structures. I need to read them using C# and speed is an issue.

I have done the following in C#

[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Size = 108)]
public struct PreIndexStruct {
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 96)]
    public string Key;
    public long Offset;
    public int Count;
}

And then I read the data from the file using

using (BinaryReader br = new BinaryReader(
       new FileStream(pathToFile, FileMode.Open, FileAccess.Read, 
                      FileShare.Read, bufferSize))) 
{
    long length = br.BaseStream.Length;
    long position = 0;

    byte[] buff = new byte[structSize];
    GCHandle buffHandle = GCHandle.Alloc(buff, GCHandleType.Pinned);
    while (position < length) {
        br.Read(buff, 0, structSize);
        PreIndexStruct pis = (PreIndexStruct)Marshal.PtrToStructure(
            buffHandle.AddrOfPinnedObject(), typeof(PreIndexStruct));
        structures.Add(pis);

        position += structSize;
    }
    buffHandle.Free();
}

This works perfectly and I can retrieve the data just fine from the files.

I've read that I can speedup things if instead of using GCHandle.Alloc/Marshal.PtrToStructure I use C++/CLI or C# unsafe code. I found some examples but they only refer to structures without fixed sized arrays.

My question is, for my particular case, is there a faster way of doing things with C++/CLI or C# unsafe code?

EDIT

Additional performance info (I've used ANTS Performance Profiler 7.4):

66% of my CPU time is used by calls to Marshal.PtrToStructure.

Regarding I/O, only 6 out of 105ms are used to read from the file.

like image 233
Morat Avatar asked Jan 30 '13 14:01

Morat


2 Answers

In this case, you don't explicitly need to use P/Invoke since you don't have to pass the struct back and forth between managed and native code. So you could do this instead. It would avoid this useless GC handle allocation, and allocate only what's needed.

public struct PreIndexStruct {
    public string Key;
    public long Offset;
    public int Count;
}

while (...) {
    ...
    PreIndexStruct pis = new PreIndexStruct();
    pis.Key = Encoding.Default.GetString(reader.ReadBytes(96));
    pis.Offset = reader.ReadInt64();
    pis.Count = reader.ReadInt32();
    structures.Add(pis);
}

I'm not sure you can be much faster than this.

like image 65
Simon Mourier Avatar answered Oct 27 '22 01:10

Simon Mourier


Probably more correctly you want to use unmanaged code, this is what I would do:

  1. Create a C++/CLI project and get your existing c# code ported and running there
  2. Determine where your bottleneck is (use the profiler)
  3. rewrite that section of the code in straight C++, call it from the C++/CLI code and make sure it works, profile it again
  4. surround your new code with "#pragma unmanaged"
  5. profile it again

You will probably get some level of speed increase, but it may not be what you are expecting.

like image 35
David Hope Avatar answered Oct 26 '22 23:10

David Hope