Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtain non-explicit field offset

I have the following class:

[StructLayout(LayoutKind.Sequential)]
class Class
{
    public int Field1;
    public byte Field2;
    public short? Field3;
    public bool Field4;
}

How can I get the byte offset of Field4 starting from the start of the class data (or object header)?
To illustrate:

Class cls = new Class();
fixed(int* ptr1 = &cls.Field1) //first field
fixed(bool* ptr2 = &cls.Field4) //requested field
{
    Console.WriteLine((byte*)ptr2-(byte*)ptr1);
}

The resulting offset is, in this case, 5, because the runtime actually moves Field3 to the end of the type (and pads it), probably because it its type is generic. I know there is Marshal.OffsetOf, but it returns unmanaged offset, not managed.

How can I retrieve this offset from a FieldInfo instance? Is there any .NET method used for that, or do I have to write my own, taking all the exceptions into account (type size, padding, explicit offsets, etc.)?

like image 250
IS4 Avatar asked Jun 13 '15 11:06

IS4


1 Answers

Offset of a field within a class or struct in .NET 4.7.2:

public static int GetFieldOffset(this FieldInfo fi) =>
                    GetFieldOffset(fi.FieldHandle);

public static int GetFieldOffset(RuntimeFieldHandle h) => 
                    Marshal.ReadInt32(h.Value + (4 + IntPtr.Size)) & 0xFFFFFF;

These return the byte offset of a field within a class or struct, relative to the layout of some respective managed instance at runtime. This works for all StructLayout modes, and for both value- and reference-types (including generics, reference-containing, or otherwise non-blittable). The offset value is zero-based relative to the beginning of the user-defined content or 'data body' of the struct or class only, and doesn't include any header, prefix, or other pad bytes.

Discussion

Since struct types have no header, the returned integer offset value can used directly via pointer arithmetic, and System.Runtime.CompilerServices.Unsafe if necessary (not shown here). Reference-type objects, on the other hand, have a header which has to be skipped-over in order to reference the desired field. This object header is usually a single IntPtr, which means IntPtr.Size needs to be added to the the offset value. It is also necessary to dereference the GC ("garbage collection") handle to obtain the object's address in the first place.

With these considerations, we can synthesize a tracking reference to the interior of a GC object at runtime by combining the field offset (obtained via the method shown above) with an instance of the class (e.g. an Object handle).

The following method, which is only meaningful for class (and not struct) types, demonstrates the technique. For simplicity, it uses ref-return and the System.Runtime.CompilerServices.Unsafe libary. Error checking, such as asserting fi.DeclaringType.IsSubclassOf(obj.GetType()) for example, is also elided for simplicity.

/// <summary>
/// Returns a managed reference ("interior pointer") to the value or instance of type 'U'
/// stored in the field indicated by 'fi' within managed object instance 'obj'
/// </summary>
public static unsafe ref U RefFieldValue<U>(Object obj, FieldInfo fi)
{
    var pobj = Unsafe.As<Object, IntPtr>(ref obj);
    pobj += IntPtr.Size + GetFieldOffset(fi.FieldHandle);
    return ref Unsafe.AsRef<U>(pobj.ToPointer());
}

This method returns a managed "tracking" pointer into the interior of the garbage-collected object instance obj.[see comment] It can be used to arbitrarily read or write the field, so this one function replaces the traditional pair of separate getter/setter functions. Although the returned pointer cannot be stored in the GC heap and thus has a lifetime limited to the scope of the current stack frame (i.e., and below), it is very cheap to obtain at any time by simply calling the function again.

Note that this generic method is only parameterized with <U>, the type of the fetched pointed-at value, and not for the type ("<T>", perhaps) of the containing class (the same applies for the IL version below). It's because the bare-bones simplicity of this technique doesn't require it. We already know that the containing instance has to be a reference (class) type, so at runtime it will present via a reference handle to a GC object with object header, and those facts alone are sufficient here; nothing further needs to be known about putative type "T".

It's a matter of opinion whether adding vacuous <T, … >, which would allow us to indicate the where T: class constraint, would improve the look or feel of the example above. It certainly wouldn't hurt anything; I believe the JIT is smart enough to not generate additional generic method instantiations for generic arguments that have no effect. But since doing so seems chatty (other than for stating the constraint), I opted for the minimalism of strict necessity here.

In my own use, rather than passing a FieldInfo or its respective FieldHandle every time, what I actually retain are the various integer offset values for the fields of interest as returned from GetFieldOffset, since these are also invariant at runtime, once obtained. This eliminates the extra step (of calling GetFieldOffset) each time the pointer is fetched. In fact, since I am able to include IL code in my projects, here is the exact code that I use for the function above. As with the C# just shown, it trivially synthesizes a managed pointer from a containing GC-object obj, plus a (retained) integer offset offs within it.

// Returns a managed 'ByRef' pointer to the (struct or reference-type) instance of type U 
// stored in the field at byte offset 'offs' within reference type instance 'obj'

.method public static !!U& RefFieldValue<U>(object obj, int32 offs) aggressiveinlining
{
    ldarg obj
    ldarg offs
    sizeof object
    add
    add
    ret
}

So even if you are not able to directly incorporate this IL, showing it here, I think, nicely illustrates the extremely low runtime overhead and alluring simplicity, in general, of this technique.

Example usage

class MyClass { public byte b_bar; public String s0, s1; public int iFoo; }

The first demonstration gets the integer offset of reference-typed field s1 within an instance of MyClass, and then uses it to get and set the field value.

var fi = typeof(MyClass).GetField("s1");

// note that we can get a field offset without actually
// having any instance of 'MyClass'
var offs = GetFieldOffset(fi);

// i.e., later... 

var mc = new MyClass();

RefFieldValue<String>(mc, offs) = "moo-maa";      // field "setter"

// note: method call used as l-value, on the left-hand side of '=' assignment!

RefFieldValue<String>(mc, offs) += "!!";          // in-situ access

Console.WriteLine(mc.s1);                         // --> moo-maa!! (in the original)

// can be used as a non-ref "getter" for by-value access
var _ = RefFieldValue<String>(mc, offs) + "%%";   // 'mc.s1' not affected

If this seems a bit cluttered, you can dramatically clean it up by retaining the managed pointer as ref local variable. As you know, this type of pointer is automatically adjusted--with interior offset preserved--whenever the GC moves the containing object. This means that it will remain valid even as you continue accessing the field unawares. In exchange for allowing this capability, the CLR requires that the ref local variable itself not be allowed to escape its stack frame, which in this case is enforced by the C# compiler.

// demonstrate using 'RuntimeFieldHandle', and accessing a value-type
// field (int) this time
var h = typeof(MyClass).GetField(nameof(mc.iFoo)).FieldHandle; 

// later... (still using 'mc' instance created above)

// acquire managed pointer to 'mc.iFoo'
ref int i = ref RefFieldValue<int>(mc, h);      

i = 21;                                        // directly affects 'mc.iFoo'
Console.WriteLine(mc.iFoo == 21);              // --> true

i <<= 1;                                       // operates directly on 'mc.iFoo'
Console.WriteLine(mc.iFoo == 42);              // --> true

// any/all 'ref' uses of 'i' just affect 'mc.iFoo' directly:
Interlocked.CompareExchange(ref i, 34, 42);    // 'mc.iFoo' (and 'i' also): 42 -> 34

Summary

The usage examples focused on using the technique with a class object, but as noted, the GetFieldOffset method shown here works perfectly fine with struct as well. Just be sure not to use the RefFieldValue method with value-types, since that code includes adjusting for an expected object header. For that simpler case, just use System.Runtime.CompilerServicesUnsafe.AddByteOffset for your address arithmetic instead.

Needless to say, this technique might seem a bit radical to some. I'll just note that it has worked flawlessly for me for many years, specifically on .NET Framework 4.7.2, and including 32- and 64-bit mode, debug vs. release, plus whichever various JIT optimization settings I've tried.

like image 121
Glenn Slayden Avatar answered Oct 06 '22 19:10

Glenn Slayden