Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# interop: bad interaction between fixed and MarshalAs

I need to marshal some nested structures in C# 4.0 into binary blobs to pass to a C++ framework.

I have so far had a lot of success using unsafe/fixed to handle fixed length arrays of primitive types. Now I need to handle a structure that contains nested fixed length arrays of other structures.

I was using complicated workarounds flattening the structures but then I came across an example of the MarshalAs attribute which looked like it could save me a great deal of problems.

Unfortunately whilst it gives me the correct amount of data it seems to also stop the fixed arrays from being marshalled properly, as the output of this program demonstrates. You can confirm the failure by putting a breakpoint on the last line and examining the memory at each pointer.

using System;
using System.Threading;
using System.Runtime.InteropServices;

namespace MarshalNested
{
  public unsafe struct a_struct_test1
  {
    public fixed sbyte a_string[3];
    public fixed sbyte some_data[12];
  }

  public struct a_struct_test2
  {
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public sbyte[] a_string;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    public a_nested[] some_data;
  }

  public unsafe struct a_struct_test3
  {
    public fixed sbyte a_string[3];
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    public a_nested[] some_data;
  }


  public unsafe struct a_nested
  {
    public fixed sbyte a_notherstring[3];
  }

  class Program
  {
    static unsafe void Main(string[] args)
    {
      a_struct_test1 lStruct1 = new a_struct_test1();
      lStruct1.a_string[0] = (sbyte)'a';
      lStruct1.a_string[1] = (sbyte)'b';
      lStruct1.a_string[2] = (sbyte)'c';

      a_struct_test2 lStruct2 = new a_struct_test2();
      lStruct2.a_string = new sbyte[3];
      lStruct2.a_string[0] = (sbyte)'a';
      lStruct2.a_string[1] = (sbyte)'b';
      lStruct2.a_string[2] = (sbyte)'c';

      a_struct_test3 lStruct3 = new a_struct_test3();
      lStruct3.a_string[0] = (sbyte)'a';
      lStruct3.a_string[1] = (sbyte)'b';
      lStruct3.a_string[2] = (sbyte)'c';

      IntPtr lPtr1 = Marshal.AllocHGlobal(15);
      Marshal.StructureToPtr(lStruct1, lPtr1, false);

      IntPtr lPtr2 = Marshal.AllocHGlobal(15);
      Marshal.StructureToPtr(lStruct2, lPtr2, false);

      IntPtr lPtr3 = Marshal.AllocHGlobal(15);
      Marshal.StructureToPtr(lStruct3, lPtr3, false);

      string s1 = "";
      string s2 = "";
      string s3 = "";
      for (int x = 0; x < 3; x++)
      {
        s1 += (char) Marshal.ReadByte(lPtr1+x);
        s2 += (char) Marshal.ReadByte(lPtr2+x);
        s3 += (char) Marshal.ReadByte(lPtr3+x);
      }

      Console.WriteLine("Ptr1 (size " + Marshal.SizeOf(lStruct1) + ") says " + s1);
      Console.WriteLine("Ptr2 (size " + Marshal.SizeOf(lStruct2) + ") says " + s2);
      Console.WriteLine("Ptr3 (size " + Marshal.SizeOf(lStruct3) + ") says " + s3);

      Thread.Sleep(10000);
    }
  }
}

Output:

Ptr1 (size 15) says abc
Ptr2 (size 15) says abc
Ptr3 (size 15) says a

So for some reason it is only marshalling the first character of my fixed ANSI strings. Is there any way around this, or have I done something stupid unrelated to the marshalling?

like image 725
Derf Skren Avatar asked Mar 17 '16 22:03

Derf Skren


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Why is C named so?

Quote from wikipedia: "A successor to the programming language B, C was originally developed at Bell Labs by Dennis Ritchie between 1972 and 1973 to construct utilities running on Unix." The creators want that everyone "see" his language. So he named it "C".

What is C language?

C is a structured, procedural programming language that has been widely used both for operating systems and applications and that has had a wide following in the academic community. Many versions of UNIX-based operating systems are written in C.


1 Answers

This is a case of a missing diagnostic. Somebody should have spoken up and tell you that your declaration is not supported. Where that somebody is either the C# compiler, producing a compile error, or the CLR field marshaller, producing a runtime exception.

It's not like you can't get a diagnostic. You'll certainly get one when you actually start using the struct as intended:

    a_struct_test3 lStruct3 = new a_struct_test3();
    lStruct3.some_data = new a_nested[4];
    lStruct3.some_data[0] = new a_nested();
    lStruct3.some_data[0].a_notherstring[0] = (sbyte)'a';  // Eek!

Which elicits CS1666, "You cannot use fixed size buffers contained in unfixed expressions. Try using the fixed statement". Not that "try this" advice is all that helpful:

    fixed (sbyte* p = &lStruct3.some_data[0].a_notherstring[0])  // Eek!
    {
        *p = (sbyte)'a';
    }

Exact same CS1666 error. Next thing you'd try is put an attribute on the fixed buffer:

public unsafe struct a_struct_test3 {
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public fixed sbyte a_string[3];
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    public a_nested[] some_data;
}
//...

    a_struct_test3 lStruct3 = new a_struct_test3();
    lStruct3.some_data = new a_nested[4];
    IntPtr lPtr3 = Marshal.AllocHGlobal(15);
    Marshal.StructureToPtr(lStruct3, lPtr3, false);  // Eek!

Keeps the C# compiler happy but now the CLR speaks up and you get a TypeLoadException at runtime: "Additional information: Cannot marshal field 'a_string' of type 'MarshalNested.a_struct_test3': Invalid managed/unmanaged type combination (this value type must be paired with Struct)."

So, in a nutshell you should have gotten either CS1666 or TypeLoadException on your original attempt as well. That did not happen because the C# compiler was not forced to look at the bad part, it only generates CS1666 on a statement that accesses the array. And it did not happen at runtime because the field marshaller in the CLR did not attempt to marshal the array because it is null. You can file a bug feedback report at connect.microsoft.com but I'd be greatly surprised if they won't close it with "by design".


In general, an obscure detail matters a great deal to the field marshaller in the CLR, the chunk of code that converts struct values and class objects from their managed layout to their unmanaged layout. It is poorly documented, Microsoft does not want to nail down the exact implementation details. Mostly because they depend too much on the target architecture.

What matters a great deal is whether or not a value or object is blittable. It is blittable when the managed and unmanaged layout is identical. Which only happens when every member of the type has the exact same size and alignment in both layouts. That normally only happens when the fields are of a very simple value type (like byte or int) or a struct that itself is blittable. Notoriously not when it is bool, too many conflicting unmanaged bool types. A field of an array type is never blittable, managed arrays don't look anything like C arrays since they have an object header and a Length member.

Having a blittable value or object is highly desirable, it avoids the field marshaller from having to create a copy. The native code gets a simple pointer to managed memory and all that is needed is to pin the memory. Very fast. It is also very dangerous, if the declaration does not match then the native code can easily color outside the lines and corrupt the GC heap or stack frame. A very common reason for a program that use pinvoke to bomb randomly with ExecutionEngineException, excessively difficult to diagnose. Such a declaration really deserves the unsafe keyword but the C# compiler does not insist on it. Nor can it, compilers are not allowed to make any assumptions about managed object layout. You keep it safe by using Debug.Assert() on the return value of Marshal.SizeOf<T>, it must be an exact match with the value of sizeof(T) in a C program.

As noted, arrays are an obstacle to getting a blittable value or object. The fixed keyword is intended as a workaround for this. The CLR treats it like an opaque value type with no members, just a blob of bytes. No object header and no Length member, as close as you could get to a C array. And used in C# code like you'd use an array in a C program, you must use a pointer to address the array elements and check three times that you don't color outside of the lines. Sometimes you must use a fixed array, happens when you declare a union (overlapping fields) and you overlap an array with a value. Poison to the garbage collector, it can no longer figure out if the field stores an object root. Not detected by the C# compiler but reliably trips a TypeLoadException at runtime.


Long story short, use fixed only for a blittable type. Mixing fields of a fixed size buffer type with fields that must be marshaled cannot work. And isn't useful, the object or value gets copied anyway so you might as well use the friendly array type.

like image 52
Hans Passant Avatar answered Oct 19 '22 06:10

Hans Passant