Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visual C++ inline x86 assembly: Accessing "this" pointer

According to the MSDN documentation the "this" pointer is stored in ECX when using the default __thiscall calling convention for class functions. Despite this certainly being the case when translating regular C++ code I have encountered a problem when trying to access "this" with inline assembly.

Here's the test program:

#include <cstdio>

class TestClass
{
    long x;

    public:
        inline TestClass(long x):x(x){}

    public:
        inline long getX1(){return x;}
        inline long getX2()
        {
            _asm
            {
                mov eax,dword ptr[ecx]
            }
        }
};
int main()
{
    TestClass c(42);

    printf("c.getX1() = %d\n",c.getX1());
    printf("c.getX2() = %d\n",c.getX2());

    return 0;
}

The two Get functions are translated like this:

?getX1@TestClass@@QAEJXZ (public: long __thiscall TestClass::getX1(void)):
  00000000: 8B 01              mov         eax,dword ptr [ecx]
  00000002: C3                 ret

?getX2@TestClass@@QAEJXZ (public: long __thiscall TestClass::getX2(void)):
  00000000: 8B 01              mov         eax,dword ptr [ecx]
  00000002: C3                 ret

I think it's safe to say that these two functions are identical. Nevertheless, here's the output from the program:

c.getX1() = 42
c.getX2() = 1

Obviously "this" is not stored in ECX when the second Get function is invoked, so my question is: How do I ensure that class functions containing inline assembly follow the calling convention and/or are invoked the same way as regular/non-inlined functions?

EDIT: The main function is translated like this:

_main:
  00000000: 51                 push        ecx
  00000001: 6A 2A              push        2Ah
  00000003: 68 00 00 00 00     push        offset $SG3948
  00000008: E8 00 00 00 00     call        _printf
  0000000D: 83 C4 08           add         esp,8
  00000010: 8D 0C 24           lea         ecx,[esp]
  00000013: E8 00 00 00 00     call        ?getX2@TestClass@@QAEJXZ
  00000018: 50                 push        eax
  00000019: 68 00 00 00 00     push        offset $SG3949
  0000001E: E8 00 00 00 00     call        _printf
  00000023: 33 C0              xor         eax,eax
  00000025: 83 C4 0C           add         esp,0Ch
  00000028: C3                 ret
like image 893
Dragonion Avatar asked Aug 30 '12 09:08

Dragonion


1 Answers

I don't know whether you're misreading the documentation, or whether it's poorly written, but __thiscall does not mean that the this pointer is stored in ECX; it means that the pointer to the object is passed in ECX. In larger functions, I've seen it move from one register to another in different places in the function, and in some cases, I've seen it spilled to memory. You cannot count on it being in ECX. And where it will be can change depending on other code in the function, and the optimization flags passed to the compiler.

In your case, the issue is further complicated by the fact that your functions are inline, and probably have been inlined. (Except that _asm may inhibit inlining.) Constant propagation (a very simple and widely used optimization technique) will almost certainly mean that your call to c.getX1() will just use 42, with no function call and no access to c whatever.

In general, inline assembler is a tricky issue, precisely because you don't know what registers the compiler is using for what. Normally, in addition to actual assembler instructions, there will be directives to tell the compiler things like which registers and which variables you use, and you will be able to refer to the variables themselves in the assembler, and other such information. Unless you use these, you can assume very, very little with regards to inline assembler.

But each compiler has its own rules. Often with special syntax. Something like mov eax, [cx].x for example, or mov eax, x, might be what the Microsoft inline assembler needs. At any rate, there's no way from what you've written that the compiler could possibly deduce that you're accessing c.x. And since all other uses have been eliminated by constant propagation, it would be a very poor compiler which even generated a variable c.

EDIT:

FWIW: The documentation of Microsoft's inline assembler is at http://msdn.microsoft.com/en-us/library/4ks26t93%28v=vs.71%29.aspx. I haven't looked at it in detail, but there is a section about "Using C or C++ Symbols in __asm Blocks". This will probably explain how you can access x in the inline assembler in a way that the compiler will know that x has been accessed.

like image 89
James Kanze Avatar answered Nov 13 '22 07:11

James Kanze