Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does casting a struct to a similar class sort-of work?

Warning: This is merely an exercise for those whose are passionate about breaking stuff to understand their mechanics.

I was exploring the limits of what I could accomplish in C# and I wrote a ForceCast() function to perform a brute-force cast without any type checks. Never consider using this function in production code.

I wrote a class called Original and a struct called LikeOriginal, both with two integer variables. In Main() I created a new variable called orig and set it to a new instance of Original with a=7 and b=20. When orig is cast into LikeOriginal and stored in casted, the values of cG and dG become undefined, which is to be expected as LikeOriginal is a struct and class instances contain more metadata than struct instances thus causing memory layout mismatch.

Example Output:

Casted Original to LikeOriginal
1300246376, 542
1300246376, 542
added 3
Casted LikeOriginal back to Original
1300246379, 545

Notice, however, that when I call casted.Add(3) and cast back to Original and print the values of a and b, surprisingly they are successfully incremented by 3, and this has been repeatable.

What is confusing me is the fact that casting the class to the struct will cause cG and dG to map to class metadata, but when they are modified and cast back to a class, they map correctly with a and b.

Why is this the case?

The code used:

using System;
using System.Runtime.InteropServices;

namespace BreakingStuff {
    public class Original {
        public int a, b;

        public Original(int a, int b)
        {
            this.a = a;
            this.b = b;
        }

        public void Add(int val)
        {
        }
    }

    public struct LikeOriginal {
        public int cG, dG;

        public override string ToString() {
            return cG + ", " + dG;
        }

        public void Add(int val) {
            cG += val;
            dG += val;
        }
    }

    public static class Program {
        public unsafe static void Main() {
            Original orig = new Original(7, 20);
            LikeOriginal casted = ForceCast<Original, LikeOriginal>(orig);
            Console.WriteLine("Casted Original to LikeOriginal");
            Console.WriteLine(casted.cG + ", " + casted.dG);
            Console.WriteLine(casted.ToString());
            casted.Add(3);
            Console.WriteLine("added 3");
            orig = ForceCast<LikeOriginal, Original>(casted);
            Console.WriteLine("Casted LikeOriginal back to Original");
            Console.WriteLine(orig.a + ", " + orig.b);
            Console.ReadLine();
        }

        //performs a pointer cast but with the same memory layout.
        private static unsafe TOut ForceCast<TIn, TOut>(this TIn input) {
            GCHandle handle = GCHandle.Alloc(input);
            TOut result = Read<TOut>(GCHandle.ToIntPtr(handle));
            handle.Free();
            return result;
        }

        private static unsafe T Read<T>(this IntPtr address) {
            T obj = default(T);
            if (address == IntPtr.Zero)
                return obj;
            TypedReference tr = __makeref(obj);
            *(IntPtr*) (&tr) = address;
            return __refvalue(tr, T);
        }
    }
}
like image 887
MathuSum Mut Avatar asked Aug 08 '17 10:08

MathuSum Mut


2 Answers

€dit: Long story short: first create a ForceCast function that correctly handles both identity translations ForceCast<LikeOriginal, LikeOriginal> and ForceCast<Original, Original>, then you might have a chance to get actual conversions working

A working sample

By providing different codes for class->class (CC), class->struct (CS), struct->class (SC) and struct->struct (SS), using Nullable<T> as intermediate for structs, I got a working example:

// class -> class
private static unsafe TOut ForceCastCC<TIn, TOut>(TIn input)
    where TIn : class
    where TOut : class
{
    var handle = __makeref(input);
    return Read<TOut>(*(IntPtr*)(&handle));
}

// struct -> struct, require nullable types for in-out
private static unsafe TOut? ForceCastSS<TIn, TOut>(TIn? input)
    where TIn : struct
    where TOut : struct
{
    var handle = __makeref(input);
    return Read<TOut?>(*(IntPtr*)(&handle));
}

// class -> struct
private static unsafe TOut? ForceCastCS<TIn, TOut>(TIn input)
    where TIn : class
    where TOut : struct
{
    var handle = __makeref(input);
    // one extra de-reference of the input pointer
    return Read<TOut?>(*(IntPtr*)*(IntPtr*)(&handle));
}

// struct -> class
private static unsafe TOut ForceCastSC<TIn, TOut>(TIn? input)
    where TIn : struct
    where TOut : class
{
    // get a real pointer to the struct, so it can be turned into a reference type
    var handle = GCHandle.Alloc(input);
    var result = Read<TOut>(GCHandle.ToIntPtr(handle));
    handle.Free();
    return result;
}

Now use the appropriate function in your sample and handle the nullable types like the compiler demands:

Original orig = new Original(7, 20);
LikeOriginal casted = ForceCastCS<Original, LikeOriginal>(orig) ?? default(LikeOriginal);
Console.WriteLine("Casted Original to LikeOriginal");
Console.WriteLine(casted.cG + ", " + casted.dG);
Console.WriteLine(casted.ToString());
casted.Add(3);
Console.WriteLine("added 3");
orig = ForceCastSC<LikeOriginal, Original>(casted);
Console.WriteLine("Casted LikeOriginal back to Original");
Console.WriteLine(orig.a + ", " + orig.b);

Console.ReadLine();

For me, this returns the correct numbers at each point.


Details

Some details:

Basically, your problem is you treat a value type like a reference type...

Lets first look at the working case: LikeOriginal -> Original:

var h1 = GCHandle.Alloc(likeOriginal);
var ptr1 = GCHandle.ToIntPtr(h1);

This creates a pointer that points to the memory area of LikeOriginal (€dit: actually, not really exactly that memory area, see below)

var obj1 = default(Original);
TypedReference t1 = __makeref(obj1);
*(IntPtr*)(&t1) = ptr1;

This creates a reference (pointer) to Original with the value of a pointer, pointing to LikeOriginal

var original = __refvalue( t1,Original);

This turns the typed reference into a managed reference, pointing to the memory of LikeOriginal. All values of the starting likeOriginal object are retained.

Now lets analyze some intermediate case that should work, if your code would work bi-directional: LikeOriginal -> LikeOriginal:

var h2 = GCHandle.Alloc(likeOriginal);
var ptr2 = GCHandle.ToIntPtr(h2);

Again, we have a pointer that points to the memory area of LikeOriginal

var obj2 = default(LikeOriginal);
TypedReference t2 = __makeref(obj2);

Now here is the first hint of what is going wrong: __makeref(obj2) will create a reference to the LikeOriginal object, not to some separate area where the pointer is stored.

*(IntPtr*)(&t2) = ptr2;

ptr2 however, is a pointer to some reference value

var likeOriginal2 = __refvalue( t2,LikeOriginal);

Here we are, getting garbage because t2 would be supposed to be a direct reference to the object memory, instead of a reference to some pointer memory.


Following is some testcode I executed to get a better understanding of your approach and what goes wrong (some of it pretty structured, then some parts where I tried some additional things):

Original o1 = new Original(111, 222);
LikeOriginal o2 = new LikeOriginal { cG = 333, dG = 444 };

// get handles to the objects themselfes and to their individual properties
GCHandle h1 = GCHandle.Alloc(o1);
GCHandle h2 = GCHandle.Alloc(o1.a);
GCHandle h3 = GCHandle.Alloc(o1.b);
GCHandle h4 = GCHandle.Alloc(o2);
GCHandle h5 = GCHandle.Alloc(o2.cG);
GCHandle h6 = GCHandle.Alloc(o2.dG);

// get pointers from the handles, each pointer has an individual value
IntPtr i1 = GCHandle.ToIntPtr(h1);
IntPtr i2 = GCHandle.ToIntPtr(h2);
IntPtr i3 = GCHandle.ToIntPtr(h3);
IntPtr i4 = GCHandle.ToIntPtr(h4);
IntPtr i5 = GCHandle.ToIntPtr(h5);
IntPtr i6 = GCHandle.ToIntPtr(h6);

// get typed references for the objects and properties
TypedReference t1 = __makeref(o1);
TypedReference t2 = __makeref(o1.a);
TypedReference t3 = __makeref(o1.b);
TypedReference t4 = __makeref(o2);
TypedReference t5 = __makeref(o2.cG);
TypedReference t6 = __makeref(o2.dG);

// get the associated pointers
IntPtr j1 = *(IntPtr*)(&t1);
IntPtr j2 = *(IntPtr*)(&t2); // j1 != j2, because a class handle points to the pointer/reference memory
IntPtr j3 = *(IntPtr*)(&t3);
IntPtr j4 = *(IntPtr*)(&t4);
IntPtr j5 = *(IntPtr*)(&t5); // j4 == j5, because a struct handle points directly to the instance memory
IntPtr j6 = *(IntPtr*)(&t6);

// direct translate-back is working for all objects and properties
var r1 = __refvalue( t1,Original);
var r2 = __refvalue( t2,int);
var r3 = __refvalue( t3,int);
var r4 = __refvalue( t4,LikeOriginal);
var r5 = __refvalue( t5,int);
var r6 = __refvalue( t6,int);

// assigning the pointers that where inferred from the GCHandles
*(IntPtr*)(&t1) = i1;
*(IntPtr*)(&t2) = i2;
*(IntPtr*)(&t3) = i3;
*(IntPtr*)(&t4) = i4;
*(IntPtr*)(&t5) = i5;
*(IntPtr*)(&t6) = i6;

// translate back the changed references
var s1 = __refvalue( t1,Original); // Ok
// rest is garbage values!
var s2 = __refvalue( t2,int);
var s3 = __refvalue( t3,int);
var s4 = __refvalue( t4,LikeOriginal);
var s5 = __refvalue( t5,int);
var s6 = __refvalue( t6,int);

// a variation, primitively dereferencing the pointer to get to the actual memory
*(IntPtr*)(&t4) = *(IntPtr*)i4;
var s4_1 = __refvalue( t4,LikeOriginal); // partial result, getting { garbage, 333 } instead of { 333, 444 }

// prepare TypedReference for translation between Original and LikeOriginal
var obj1 = default(Original);
var obj2 = default(LikeOriginal);
TypedReference t7 = __makeref(obj1);
TypedReference t8 = __makeref(obj2);

// translate between Original and LikeOriginal
*(IntPtr*)(&t7) = i4; // From struct to class, the pointer aquired through GCHandle is apropriate
var s7 = __refvalue( t7,Original); // Ok

*(IntPtr*)(&t8) = *(IntPtr*)j1;
var s8 = __refvalue( t8,LikeOriginal); // Not Ok - Original has some value comming before its first member - getting { garbage, 111 } instead of { 111, 222 }

*(IntPtr*)(&t8) = j2;
var s9 = __refvalue( t8,LikeOriginal); // Ok by starting at the address of the first member

Conclusion: Going via GCHandle -> IntPtr is creating a pointer that is pointing to one memory location in front of the first member, no matter whether the starting point is a struct or a class. This results in a situation, where struct -> class or class -> class is working but class -> struct or struct -> struct is not working.

The only way I found for targeting structs is to get a pointer to their first member (which in case of an input struct equals the __makeref to the struct without going via GCHandle).

like image 122
grek40 Avatar answered Oct 06 '22 01:10

grek40


Here is how I see this situation. You have acted upon the reference to Original as if it were a reference to LikeOriginal. Critical point here is that you are invoking LikeOriginal.Add() method, the address of which is resolved statically during compile time.

This method, in turn, operates on a this reference which it implicitly receives. Therefore, it modifies values which are offset by 0 and by 4 bytes relative to this reference it has in its hands.

Since this experiment worked out, it indicates that the layouts of Original object and LikeOriginal struct are the same. I know that structs have flat layout, which makes them useful when allocating arrays of structs - there will be nothing inserted into the sequence of bytes representing flat content of structs. That is precisely what doesn't stand for classes - they need one reference which will be used to resolve virtual functions and type at run time.

Which reminds me to say that the lacking of this added reference is the core reason why structs do not support derivation - you wouldn't know whether you have a base or derived struct in a later call.

Anyway, back to the surprising fact that this code worked fine. I have been working with C++ compilers and I remember that they used to put the v-table pointer before actual data content of the object. In other words, this pointer used to point 4 bytes after actual address of the memory block allocated for that object. Maybe C# is doing the same, in which case this reference in a method invoked on Original points to a, just like the this reference in a method invoked on LikeOriginal points to cG.

like image 42
Zoran Horvat Avatar answered Oct 06 '22 00:10

Zoran Horvat