Assuming the following C++ source file:
#include <stdio.h>
class BaseTest {
public:
int a;
BaseTest(): a(2){}
virtual int gB() {
return a;
};
};
class SubTest: public BaseTest {
public:
int b;
SubTest(): b(4){}
};
class TriTest: public BaseTest {
public:
int c;
TriTest(): c(42){}
};
class EvilTest: public SubTest, public TriTest {
public:
virtual int gB(){
return b;
}
};
int main(){
EvilTest * t2 = new EvilTest;
TriTest * t3 = t2;
printf("%d\n",t3->gB());
printf("%d\n",t2->gB());
return 0;
}
-fdump-class-hierarchy
gives me:
[...]
Vtable for EvilTest
EvilTest::_ZTV8EvilTest: 6u entries
0 (int (*)(...))0
8 (int (*)(...))(& _ZTI8EvilTest)
16 (int (*)(...))EvilTest::gB
24 (int (*)(...))-16
32 (int (*)(...))(& _ZTI8EvilTest)
40 (int (*)(...))EvilTest::_ZThn16_N8EvilTest2gBEv
Class EvilTest
size=32 align=8
base size=32 base align=8
EvilTest (0x0x7f1ba98a8150) 0
vptr=((& EvilTest::_ZTV8EvilTest) + 16u)
SubTest (0x0x7f1ba96df478) 0
primary-for EvilTest (0x0x7f1ba98a8150)
BaseTest (0x0x7f1ba982ba80) 0
primary-for SubTest (0x0x7f1ba96df478)
TriTest (0x0x7f1ba96df4e0) 16
vptr=((& EvilTest::_ZTV8EvilTest) + 40u)
BaseTest (0x0x7f1ba982bae0) 16
primary-for TriTest (0x0x7f1ba96df4e0)
Disassembly shows:
34 int main(){
0x000000000040076d <+0>: push rbp
0x000000000040076e <+1>: mov rbp,rsp
0x0000000000400771 <+4>: push rbx
0x0000000000400772 <+5>: sub rsp,0x18
35 EvilTest * t2 = new EvilTest;
0x0000000000400776 <+9>: mov edi,0x20
0x000000000040077b <+14>: call 0x400670 <_Znwm@plt>
0x0000000000400780 <+19>: mov rbx,rax
0x0000000000400783 <+22>: mov rdi,rbx
0x0000000000400786 <+25>: call 0x4008a8 <EvilTest::EvilTest()>
0x000000000040078b <+30>: mov QWORD PTR [rbp-0x18],rbx
36
37 TriTest * t3 = t2;
0x000000000040078f <+34>: cmp QWORD PTR [rbp-0x18],0x0
0x0000000000400794 <+39>: je 0x4007a0 <main()+51>
0x0000000000400796 <+41>: mov rax,QWORD PTR [rbp-0x18]
0x000000000040079a <+45>: add rax,0x10
0x000000000040079e <+49>: jmp 0x4007a5 <main()+56>
0x00000000004007a0 <+51>: mov eax,0x0
0x00000000004007a5 <+56>: mov QWORD PTR [rbp-0x20],rax
38
39 printf("%d\n",t3->gB());
0x00000000004007a9 <+60>: mov rax,QWORD PTR [rbp-0x20]
0x00000000004007ad <+64>: mov rax,QWORD PTR [rax]
0x00000000004007b0 <+67>: mov rax,QWORD PTR [rax]
0x00000000004007b3 <+70>: mov rdx,QWORD PTR [rbp-0x20]
0x00000000004007b7 <+74>: mov rdi,rdx
0x00000000004007ba <+77>: call rax
0x00000000004007bc <+79>: mov esi,eax
0x00000000004007be <+81>: mov edi,0x400984
0x00000000004007c3 <+86>: mov eax,0x0
0x00000000004007c8 <+91>: call 0x400640 <printf@plt>
40 printf("%d\n",t2->gB());
0x00000000004007cd <+96>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004007d1 <+100>: mov rax,QWORD PTR [rax]
0x00000000004007d4 <+103>: mov rax,QWORD PTR [rax]
0x00000000004007d7 <+106>: mov rdx,QWORD PTR [rbp-0x18]
0x00000000004007db <+110>: mov rdi,rdx
0x00000000004007de <+113>: call rax
0x00000000004007e0 <+115>: mov esi,eax
0x00000000004007e2 <+117>: mov edi,0x400984
0x00000000004007e7 <+122>: mov eax,0x0
0x00000000004007ec <+127>: call 0x400640 <printf@plt>
41 return 0;
0x00000000004007f1 <+132>: mov eax,0x0
42 }
0x00000000004007f6 <+137>: add rsp,0x18
0x00000000004007fa <+141>: pop rbx
0x00000000004007fb <+142>: pop rbp
0x00000000004007fc <+143>: ret
Now that you've had suitable time to recover from the deadly diamond in the first code block, the actual question.
When t3->gB()
is called I see the following disas (t3
is type TriTest
, gB()
is virtual method EvilTest::gB()
):
0x00000000004007a9 <+60>: mov rax,QWORD PTR [rbp-0x20]
0x00000000004007ad <+64>: mov rax,QWORD PTR [rax]
0x00000000004007b0 <+67>: mov rax,QWORD PTR [rax]
0x00000000004007b3 <+70>: mov rdx,QWORD PTR [rbp-0x20]
0x00000000004007b7 <+74>: mov rdi,rdx
0x00000000004007ba <+77>: call rax
The first mov moves the vtable into rax, the next dereferences it (Now we're in the vtable)
The one after that dereferences that to get a pointer to the function and at the bottom of that paste it's call
ed.
So far so good, but this brings a few questions.
Where's this
?
I presume this
is loaded into rdi
via the mov
s at +70 and +74, but that's the same pointer as the vtable which means it's a pointer to a TriTest
class which shouldn't have the SubTest
s b member at all. Does the linux thiscall convention handle virtual casting inside the called method as opposed to outside?
This was answered by rodrigo here
How do I disassemble the virtual method?
If I knew this I could answer the previous question myself. disas EvilTest::gB
gives me:
Cannot reference virtual member function "gB"
setting a breakpoint before the call
, running info reg rax
and disas
sing that gives me:
(gdb) info reg rax
rax 0x4008a1 4196513
(gdb) disas 0x4008a14196513
No function contains specified address.
(gdb) disas *0x4008a14196513
Cannot access memory at address 0x4008a14196513
Why are the vtables (apparently) only 8 bytes away from eachother?
The fdump
says there are 16 bytes between the first and second &vtable
(Which fits the 64bit pointer and 2 ints) but the dissasembly from the second gB()
call is:
0x00000000004007cd <+96>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004007d1 <+100>: mov rax,QWORD PTR [rax]
0x00000000004007d4 <+103>: mov rax,QWORD PTR [rax]
0x00000000004007d7 <+106>: mov rdx,QWORD PTR [rbp-0x18]
0x00000000004007db <+110>: mov rdi,rdx
0x00000000004007de <+113>: call rax
[rbp-0x18]
is only 8 bytes away from the previous call ([rbp-0x20]
). What's going on?
Answered by 500 in the comments
I forgot the objects were heap allocated, only their pointers are on the stack
For every class that contains virtual functions, the compiler constructs a virtual table, a.k.a vtable. The vtable contains an entry for each virtual function accessible by the class and stores a pointer to its definition. Only the most specific function definition callable by the class is stored in the vtable.
You can imagine what happens when you perform inheritance and override some of the virtual functions. The compiler creates a new VTABLE for your new class, and it inserts your new function addresses using the base-class function addresses for any virtual functions you don't override.
A vtable is created when a class declaration contains a virtual function. A vtable is introduced when a parent -- anywhere in the heirarchy -- has a virtual function, lets call this parent Y. Any parent of Y WILL NOT have a vtable (unless they have a virtual for some other function in their heirarchy).
Working of virtual functions (concept of VTABLE and VPTR)If object of that class is created then a virtual pointer (VPTR) is inserted as a data member of the class to point to VTABLE of that class. For each new object created, a new virtual pointer is inserted as a data member of that class.
Disclaimer: I'm no expert in the GCC internal, but I'll try to explain what I think is going on. Also note that you are not using virtual inheritance, but plain multiple inheritance, so your EvilTest
object actually contains two BaseTest
subobjects. You can see that is the case by trying to use this->a
in EvilTest
: you'll get an ambiguous reference error.
First of all be aware that every VTable has 2 values in the negative offsets:
-2
: the this
offset (more on this later).-1
: pointer to run-time type information for this class.Then, from 0
on, there will be the pointers to virtual functions:
With that in mind, I'll write the VTable of the classes, with easy to read names:
[-2]: 0
[-1]: typeof(BaseTest)
[ 0]: BaseTest::gB
[-2]: 0
[-1]: typeof(SubTest)
[ 0]: BaseTest::gB
[-2]: 0
[-1]: typeof(TriTest)
[ 0]: BaseTest::gB
Up until this point nothing too interesting.
[-2]: 0
[-1]: typeof(EvilTest)
[ 0]: EvilTest::gB
[ 1]: -16
[ 2]: typeof(EvilTest)
[ 3]: EvilTest::thunk_gB
Now that is interesting! It is easier to see it working:
EvilTest * t2 = new EvilTest;
t2->gB();
This code calls the function at VTable[0]
, that is simply EvilTest::gB
and all goes fine.
But then you do:
TriTest * t3 = t2;
Since TriTest
is not the first base class of EvilTest
, the actual binary value of t3
is different from that of t2
. That is, the cast advances the pointer N bytes. The exact amount is known by the compiler at compile time, because it depends only on the static types of the expressions. In your code it is 16 bytes. Note that if the pointer is NULL
, then it must not be advanced, thus the branch in the disassembler.
At this point is interesting to see the memory layout of the EvilTest
object:
[ 0]: pointer to VTable of EvilTest-as-BaseTest
[ 1]: BaseTest::a
[ 2]: SubTest::b
[ 3]: pointer to VTable of EvilTest-as-TriTest
[ 4]: BaseTest::a
[ 5]: TriTest::c
As you can see, when you cast a EvilTest*
to a TriTest*
you have to advance this
to the element [3]
, that is 8+4+4 = 16 bytes in a 64-bit system.
t3->gB();
Now you use that pointer to call the gB()
. That is done using the element [0]
of the VTable, as before. But since that function is actually from EvilTest
, the this
pointer must be moved back 16 bytes before EvilTest::gB()
can be called. That is the work of EvilTest::thunk_gB()
, this is a little function that reads the VTable[-1]
value and substract that value to this
. Now everything matches!
It is worth noting that the full VTable of EvilTest
is the concatenation of the VTable of EvilTest-as-BaseTest plus the VTable of EvilTest-as-TriTest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With