Assuming the following C++ source file: <pre class="prettyprint"><code>#include <stdio.h> class BaseTest { public: int a; BaseTest(): a(2){} virtual int gB() { return a; }; }; class SubTest: public BaseTest { public: int b; SubTest(): b(4){} }; class TriTest: public BaseTest { public: int c; TriTest(): c(42){} }; class EvilTest: public SubTest, public TriTest { public: virtual int gB(){ return b; } }; int main(){ EvilTest * t2 = new EvilTest; TriTest * t3 = t2; printf("%d\n",t3->gB()); printf("%d\n",t2->gB()); return 0; } </code></pre> <code>-fdump-class-hierarchy</code> gives me: <pre class="prettyprint"><code>[...] Vtable for EvilTest EvilTest::_ZTV8EvilTest: 6u entries 0 (int (*)(...))0 8 (int (*)(...))(& _ZTI8EvilTest) 16 (int (*)(...))EvilTest::gB 24 (int (*)(...))-16 32 (int (*)(...))(& _ZTI8EvilTest) 40 (int (*)(...))EvilTest::_ZThn16_N8EvilTest2gBEv Class EvilTest size=32 align=8 base size=32 base align=8 EvilTest (0x0x7f1ba98a8150) 0 vptr=((& EvilTest::_ZTV8EvilTest) + 16u) SubTest (0x0x7f1ba96df478) 0 primary-for EvilTest (0x0x7f1ba98a8150) BaseTest (0x0x7f1ba982ba80) 0 primary-for SubTest (0x0x7f1ba96df478) TriTest (0x0x7f1ba96df4e0) 16 vptr=((& EvilTest::_ZTV8EvilTest) + 40u) BaseTest (0x0x7f1ba982bae0) 16 primary-for TriTest (0x0x7f1ba96df4e0) </code></pre> Disassembly shows: <pre class="prettyprint"><code>34 int main(){ 0x000000000040076d <+0>: push rbp 0x000000000040076e <+1>: mov rbp,rsp 0x0000000000400771 <+4>: push rbx 0x0000000000400772 <+5>: sub rsp,0x18 35 EvilTest * t2 = new EvilTest; 0x0000000000400776 <+9>: mov edi,0x20 0x000000000040077b <+14>: call 0x400670 <_Znwm@plt> 0x0000000000400780 <+19>: mov rbx,rax 0x0000000000400783 <+22>: mov rdi,rbx 0x0000000000400786 <+25>: call 0x4008a8 <EvilTest::EvilTest()> 0x000000000040078b <+30>: mov QWORD PTR [rbp-0x18],rbx 36 37 TriTest * t3 = t2; 0x000000000040078f <+34>: cmp QWORD PTR [rbp-0x18],0x0 0x0000000000400794 <+39>: je 0x4007a0 <main()+51> 0x0000000000400796 <+41>: mov rax,QWORD PTR [rbp-0x18] 0x000000000040079a <+45>: add rax,0x10 0x000000000040079e <+49>: jmp 0x4007a5 <main()+56> 0x00000000004007a0 <+51>: mov eax,0x0 0x00000000004007a5 <+56>: mov QWORD PTR [rbp-0x20],rax 38 39 printf("%d\n",t3->gB()); 0x00000000004007a9 <+60>: mov rax,QWORD PTR [rbp-0x20] 0x00000000004007ad <+64>: mov rax,QWORD PTR [rax] 0x00000000004007b0 <+67>: mov rax,QWORD PTR [rax] 0x00000000004007b3 <+70>: mov rdx,QWORD PTR [rbp-0x20] 0x00000000004007b7 <+74>: mov rdi,rdx 0x00000000004007ba <+77>: call rax 0x00000000004007bc <+79>: mov esi,eax 0x00000000004007be <+81>: mov edi,0x400984 0x00000000004007c3 <+86>: mov eax,0x0 0x00000000004007c8 <+91>: call 0x400640 <printf@plt> 40 printf("%d\n",t2->gB()); 0x00000000004007cd <+96>: mov rax,QWORD PTR [rbp-0x18] 0x00000000004007d1 <+100>: mov rax,QWORD PTR [rax] 0x00000000004007d4 <+103>: mov rax,QWORD PTR [rax] 0x00000000004007d7 <+106>: mov rdx,QWORD PTR [rbp-0x18] 0x00000000004007db <+110>: mov rdi,rdx 0x00000000004007de <+113>: call rax 0x00000000004007e0 <+115>: mov esi,eax 0x00000000004007e2 <+117>: mov edi,0x400984 0x00000000004007e7 <+122>: mov eax,0x0 0x00000000004007ec <+127>: call 0x400640 <printf@plt> 41 return 0; 0x00000000004007f1 <+132>: mov eax,0x0 42 } 0x00000000004007f6 <+137>: add rsp,0x18 0x00000000004007fa <+141>: pop rbx 0x00000000004007fb <+142>: pop rbp 0x00000000004007fc <+143>: ret </code></pre> Now that you've had suitable time to recover from the deadly diamond in the first code block, the actual question. When <code>t3->gB()</code> is called I see the following disas (<code>t3</code> is type <code>TriTest</code>, <code>gB()</code> is virtual method <code>EvilTest::gB()</code> ): <pre class="prettyprint"><code> 0x00000000004007a9 <+60>: mov rax,QWORD PTR [rbp-0x20] 0x00000000004007ad <+64>: mov rax,QWORD PTR [rax] 0x00000000004007b0 <+67>: mov rax,QWORD PTR [rax] 0x00000000004007b3 <+70>: mov rdx,QWORD PTR [rbp-0x20] 0x00000000004007b7 <+74>: mov rdi,rdx 0x00000000004007ba <+77>: call rax </code></pre> The first mov moves the vtable into rax, the next dereferences it (Now we're in the vtable) The one after that dereferences that to get a pointer to the function and at the bottom of that paste it's <code>call</code>ed. So far so good, but this brings a few questions. Where's <code>this</code>? I presume <code>this</code> is loaded into <code>rdi</code> via the <code>mov</code>s at +70 and +74, but that's the same pointer as the vtable which means it's a pointer to a <code>TriTest</code> class which shouldn't have the <code>SubTest</code>s b member at all. Does the linux thiscall convention handle virtual casting inside the called method as opposed to outside? This was answered by rodrigo here How do I disassemble the virtual method? If I knew this I could answer the previous question myself. <code>disas EvilTest::gB</code> gives me: <pre class="prettyprint"><code>Cannot reference virtual member function "gB" </code></pre> setting a breakpoint before the <code>call</code>, running <code>info reg rax</code> and <code>disas</code>sing that gives me: <pre class="prettyprint"><code>(gdb) info reg rax rax 0x4008a1 4196513 (gdb) disas 0x4008a14196513 No function contains specified address. (gdb) disas *0x4008a14196513 Cannot access memory at address 0x4008a14196513 </code></pre> Why are the vtables (apparently) only 8 bytes away from eachother? The <code>fdump</code> says there are 16 bytes between the first and second <code>&vtable</code> (Which fits the 64bit pointer and 2 ints) but the dissasembly from the second <code>gB()</code> call is: <pre class="prettyprint"><code> 0x00000000004007cd <+96>: mov rax,QWORD PTR [rbp-0x18] 0x00000000004007d1 <+100>: mov rax,QWORD PTR [rax] 0x00000000004007d4 <+103>: mov rax,QWORD PTR [rax] 0x00000000004007d7 <+106>: mov rdx,QWORD PTR [rbp-0x18] 0x00000000004007db <+110>: mov rdi,rdx 0x00000000004007de <+113>: call rax </code></pre> <code>[rbp-0x18]</code> is only 8 bytes away from the previous call (<code>[rbp-0x20]</code>). What's going on? Answered by 500 in the comments I forgot the objects were heap allocated, only their pointers are on the stack

Disclaimer: I'm no expert in the GCC internal, but I'll try to explain what I think is going on. Also note that you are not using virtual inheritance, but plain multiple inheritance, so your <code>EvilTest</code> object actually contains two <code>BaseTest</code> subobjects. You can see that is the case by trying to use <code>this->a</code> in <code>EvilTest</code>: you'll get an ambiguous reference error. First of all be aware that every VTable has 2 values in the negative offsets: <ul> <li> <code>-2</code>: the <code>this</code> offset (more on this later).</li> <li> <code>-1</code>: pointer to run-time type information for this class.</li> </ul> Then, from <code>0</code> on, there will be the pointers to virtual functions: With that in mind, I'll write the VTable of the classes, with easy to read names: <h3>VTable for BaseTest:</h3> <pre class="prettyprint"><code>[-2]: 0 [-1]: typeof(BaseTest) [ 0]: BaseTest::gB </code></pre> <h3>VTable for SubTest:</h3> <pre class="prettyprint"><code>[-2]: 0 [-1]: typeof(SubTest) [ 0]: BaseTest::gB </code></pre> <h3>VTable for TriTest</h3> <pre class="prettyprint"><code>[-2]: 0 [-1]: typeof(TriTest) [ 0]: BaseTest::gB </code></pre> Up until this point nothing too interesting. <h3>VTable for EvilTest</h3> <pre class="prettyprint"><code>[-2]: 0 [-1]: typeof(EvilTest) [ 0]: EvilTest::gB [ 1]: -16 [ 2]: typeof(EvilTest) [ 3]: EvilTest::thunk_gB </code></pre> Now that is interesting! It is easier to see it working: <pre class="prettyprint"><code>EvilTest * t2 = new EvilTest; t2->gB(); </code></pre> This code calls the function at <code>VTable[0]</code>, that is simply <code>EvilTest::gB</code> and all goes fine. But then you do: <pre class="prettyprint"><code>TriTest * t3 = t2; </code></pre> Since <code>TriTest</code> is not the first base class of <code>EvilTest</code>, the actual binary value of <code>t3</code> is different from that of <code>t2</code>. That is, the cast advances the pointer N bytes. The exact amount is known by the compiler at compile time, because it depends only on the static types of the expressions. In your code it is 16 bytes. Note that if the pointer is <code>NULL</code>, then it must not be advanced, thus the branch in the disassembler. At this point is interesting to see the memory layout of the <code>EvilTest</code> object: <pre class="prettyprint"><code>[ 0]: pointer to VTable of EvilTest-as-BaseTest [ 1]: BaseTest::a [ 2]: SubTest::b [ 3]: pointer to VTable of EvilTest-as-TriTest [ 4]: BaseTest::a [ 5]: TriTest::c </code></pre> As you can see, when you cast a <code>EvilTest*</code> to a <code>TriTest*</code> you have to advance <code>this</code> to the element <code>[3]</code>, that is 8+4+4 = 16 bytes in a 64-bit system. <pre class="prettyprint"><code>t3->gB(); </code></pre> Now you use that pointer to call the <code>gB()</code>. That is done using the element <code>[0]</code> of the VTable, as before. But since that function is actually from <code>EvilTest</code>, the <code>this</code> pointer must be moved back 16 bytes before <code>EvilTest::gB()</code> can be called. That is the work of <code>EvilTest::thunk_gB()</code>, this is a little function that reads the <code>VTable[-1]</code> value and substract that value to <code>this</code>. Now everything matches! It is worth noting that the full VTable of <code>EvilTest</code> is the concatenation of the VTable of EvilTest-as-BaseTest plus the VTable of EvilTest-as-TriTest.

Dissassembling virtual methods in multiple inheritance. How is the vtable working?

Tags:

c++

assembly

vtable

Assuming the following C++ source file:

#include <stdio.h>

class BaseTest {
  public:
  int a;

  BaseTest(): a(2){}

  virtual int gB() {
    return a;
  };
};

class SubTest: public BaseTest {
  public:
  int b;

  SubTest(): b(4){}
};

class TriTest: public BaseTest {
  public:
  int c;
  TriTest(): c(42){}
};

class EvilTest: public SubTest, public TriTest {
  public:
  virtual int gB(){
    return b;
  }
};

int main(){
  EvilTest * t2 = new EvilTest;

  TriTest * t3 = t2;

  printf("%d\n",t3->gB());
  printf("%d\n",t2->gB());
  return 0;
}

-fdump-class-hierarchy gives me:

[...]
Vtable for EvilTest
EvilTest::_ZTV8EvilTest: 6u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI8EvilTest)
16    (int (*)(...))EvilTest::gB
24    (int (*)(...))-16
32    (int (*)(...))(& _ZTI8EvilTest)
40    (int (*)(...))EvilTest::_ZThn16_N8EvilTest2gBEv

Class EvilTest
   size=32 align=8
   base size=32 base align=8
EvilTest (0x0x7f1ba98a8150) 0
    vptr=((& EvilTest::_ZTV8EvilTest) + 16u)
  SubTest (0x0x7f1ba96df478) 0
      primary-for EvilTest (0x0x7f1ba98a8150)
    BaseTest (0x0x7f1ba982ba80) 0
        primary-for SubTest (0x0x7f1ba96df478)
  TriTest (0x0x7f1ba96df4e0) 16
      vptr=((& EvilTest::_ZTV8EvilTest) + 40u)
    BaseTest (0x0x7f1ba982bae0) 16
        primary-for TriTest (0x0x7f1ba96df4e0)

Disassembly shows:

34  int main(){
   0x000000000040076d <+0>: push   rbp
   0x000000000040076e <+1>: mov    rbp,rsp
   0x0000000000400771 <+4>: push   rbx
   0x0000000000400772 <+5>: sub    rsp,0x18

35    EvilTest * t2 = new EvilTest;
   0x0000000000400776 <+9>: mov    edi,0x20
   0x000000000040077b <+14>:    call   0x400670 <_Znwm@plt>
   0x0000000000400780 <+19>:    mov    rbx,rax
   0x0000000000400783 <+22>:    mov    rdi,rbx
   0x0000000000400786 <+25>:    call   0x4008a8 <EvilTest::EvilTest()>
   0x000000000040078b <+30>:    mov    QWORD PTR [rbp-0x18],rbx

36    
37    TriTest * t3 = t2;
   0x000000000040078f <+34>:    cmp    QWORD PTR [rbp-0x18],0x0
   0x0000000000400794 <+39>:    je     0x4007a0 <main()+51>
   0x0000000000400796 <+41>:    mov    rax,QWORD PTR [rbp-0x18]
   0x000000000040079a <+45>:    add    rax,0x10
   0x000000000040079e <+49>:    jmp    0x4007a5 <main()+56>
   0x00000000004007a0 <+51>:    mov    eax,0x0
   0x00000000004007a5 <+56>:    mov    QWORD PTR [rbp-0x20],rax

38    
39    printf("%d\n",t3->gB());
   0x00000000004007a9 <+60>:    mov    rax,QWORD PTR [rbp-0x20]
   0x00000000004007ad <+64>:    mov    rax,QWORD PTR [rax]
   0x00000000004007b0 <+67>:    mov    rax,QWORD PTR [rax]
   0x00000000004007b3 <+70>:    mov    rdx,QWORD PTR [rbp-0x20]
   0x00000000004007b7 <+74>:    mov    rdi,rdx
   0x00000000004007ba <+77>:    call   rax
   0x00000000004007bc <+79>:    mov    esi,eax
   0x00000000004007be <+81>:    mov    edi,0x400984
   0x00000000004007c3 <+86>:    mov    eax,0x0
   0x00000000004007c8 <+91>:    call   0x400640 <printf@plt>

40    printf("%d\n",t2->gB());
   0x00000000004007cd <+96>:    mov    rax,QWORD PTR [rbp-0x18]
   0x00000000004007d1 <+100>:   mov    rax,QWORD PTR [rax]
   0x00000000004007d4 <+103>:   mov    rax,QWORD PTR [rax]
   0x00000000004007d7 <+106>:   mov    rdx,QWORD PTR [rbp-0x18]
   0x00000000004007db <+110>:   mov    rdi,rdx
   0x00000000004007de <+113>:   call   rax
   0x00000000004007e0 <+115>:   mov    esi,eax
   0x00000000004007e2 <+117>:   mov    edi,0x400984
   0x00000000004007e7 <+122>:   mov    eax,0x0
   0x00000000004007ec <+127>:   call   0x400640 <printf@plt>

41    return 0;
   0x00000000004007f1 <+132>:   mov    eax,0x0

42  }
   0x00000000004007f6 <+137>:   add    rsp,0x18
   0x00000000004007fa <+141>:   pop    rbx
   0x00000000004007fb <+142>:   pop    rbp
   0x00000000004007fc <+143>:   ret

Now that you've had suitable time to recover from the deadly diamond in the first code block, the actual question.

When t3->gB() is called I see the following disas (t3 is type TriTest, gB() is virtual method EvilTest::gB() ):

   0x00000000004007a9 <+60>:    mov    rax,QWORD PTR [rbp-0x20]
   0x00000000004007ad <+64>:    mov    rax,QWORD PTR [rax]
   0x00000000004007b0 <+67>:    mov    rax,QWORD PTR [rax]
   0x00000000004007b3 <+70>:    mov    rdx,QWORD PTR [rbp-0x20]
   0x00000000004007b7 <+74>:    mov    rdi,rdx
   0x00000000004007ba <+77>:    call   rax

The first mov moves the vtable into rax, the next dereferences it (Now we're in the vtable)

The one after that dereferences that to get a pointer to the function and at the bottom of that paste it's called.

So far so good, but this brings a few questions.

Where's this?
I presume this is loaded into rdi via the movs at +70 and +74, but that's the same pointer as the vtable which means it's a pointer to a TriTest class which shouldn't have the SubTests b member at all. Does the linux thiscall convention handle virtual casting inside the called method as opposed to outside?

This was answered by rodrigo here

How do I disassemble the virtual method?
If I knew this I could answer the previous question myself. disas EvilTest::gB gives me:

Cannot reference virtual member function "gB"

setting a breakpoint before the call, running info reg rax and disassing that gives me:

(gdb) info reg rax
rax            0x4008a1 4196513
(gdb) disas 0x4008a14196513
No function contains specified address.
(gdb) disas *0x4008a14196513
Cannot access memory at address 0x4008a14196513

Why are the vtables (apparently) only 8 bytes away from eachother?
The fdump says there are 16 bytes between the first and second &vtable (Which fits the 64bit pointer and 2 ints) but the dissasembly from the second gB() call is:

   0x00000000004007cd <+96>:    mov    rax,QWORD PTR [rbp-0x18]
   0x00000000004007d1 <+100>:   mov    rax,QWORD PTR [rax]
   0x00000000004007d4 <+103>:   mov    rax,QWORD PTR [rax]
   0x00000000004007d7 <+106>:   mov    rdx,QWORD PTR [rbp-0x18]
   0x00000000004007db <+110>:   mov    rdi,rdx
   0x00000000004007de <+113>:   call   rax

[rbp-0x18] is only 8 bytes away from the previous call ([rbp-0x20]). What's going on?

Answered by 500 in the comments

I forgot the objects were heap allocated, only their pointers are on the stack

204

asked May 07 '14 22:05

J V

1 Answers

Disclaimer: I'm no expert in the GCC internal, but I'll try to explain what I think is going on. Also note that you are not using virtual inheritance, but plain multiple inheritance, so your EvilTest object actually contains two BaseTest subobjects. You can see that is the case by trying to use this->a in EvilTest: you'll get an ambiguous reference error.

First of all be aware that every VTable has 2 values in the negative offsets:

-2: the this offset (more on this later).
-1: pointer to run-time type information for this class.

Then, from 0 on, there will be the pointers to virtual functions:

With that in mind, I'll write the VTable of the classes, with easy to read names:

VTable for BaseTest:

[-2]: 0
[-1]: typeof(BaseTest)
[ 0]: BaseTest::gB

VTable for SubTest:

[-2]: 0
[-1]: typeof(SubTest)
[ 0]: BaseTest::gB

VTable for TriTest

[-2]: 0
[-1]: typeof(TriTest)
[ 0]: BaseTest::gB

Up until this point nothing too interesting.

VTable for EvilTest

[-2]: 0
[-1]: typeof(EvilTest)
[ 0]: EvilTest::gB
[ 1]: -16
[ 2]: typeof(EvilTest)
[ 3]: EvilTest::thunk_gB

Now that is interesting! It is easier to see it working:

EvilTest * t2 = new EvilTest;
t2->gB();

This code calls the function at VTable[0], that is simply EvilTest::gB and all goes fine.

But then you do:

TriTest * t3 = t2;

Since TriTest is not the first base class of EvilTest, the actual binary value of t3 is different from that of t2. That is, the cast advances the pointer N bytes. The exact amount is known by the compiler at compile time, because it depends only on the static types of the expressions. In your code it is 16 bytes. Note that if the pointer is NULL, then it must not be advanced, thus the branch in the disassembler.

At this point is interesting to see the memory layout of the EvilTest object:

[ 0]: pointer to VTable of EvilTest-as-BaseTest
[ 1]: BaseTest::a
[ 2]: SubTest::b
[ 3]: pointer to VTable of EvilTest-as-TriTest
[ 4]: BaseTest::a
[ 5]: TriTest::c

As you can see, when you cast a EvilTest* to a TriTest* you have to advance this to the element [3], that is 8+4+4 = 16 bytes in a 64-bit system.

t3->gB();

Now you use that pointer to call the gB(). That is done using the element [0] of the VTable, as before. But since that function is actually from EvilTest, the this pointer must be moved back 16 bytes before EvilTest::gB() can be called. That is the work of EvilTest::thunk_gB(), this is a little function that reads the VTable[-1] value and substract that value to this. Now everything matches!

It is worth noting that the full VTable of EvilTest is the concatenation of the VTable of EvilTest-as-BaseTest plus the VTable of EvilTest-as-TriTest.

192

answered Nov 12 '22 00:11

rodrigo

Related questions
                            
                                Reading multiple lines from a file using getline()
                            
                                Can a pointer point to an address after 4GB?
                            
                                Printf gone wild
                            
                                Reset string while keeping capacity
                            
                                Win32 API: How to catch escape key in Edit control?
                            
                                Can I compose pointers to member
                            
                                (Default) construct an object for every variadic type
                            
                                Error stray '#' in program
                            
                                What is the difference between 0x and '\x' in C++? [duplicate]
                            
                                does the slot function in Qt run on another thread?
                            
                                What's the difference between C++ Concept and Java Interface?
                            
                                How do I draw a rainbow in Freeglut?
                            
                                How to view contents of an array while debugging in Code Blocks?
                            
                                C++ meaning of [ ] [duplicate]
                            
                                The Standard seems to support (the snippet below compiles) a static data member having the same type as the class itself [duplicate]
                            
                                Why is this code considered reetrant and what exactly happens when the OS interrupts a thread?
                            
                                No Suitable User Defined Conversion
                            
                                Read a string of length greater than 4096 bytes from stdin in C++
                            
                                Inline function pointer to avoid if statement
                            
                                Eclipse project linked resources by environment variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With