Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain a pointer out of a C++ vtable?

Say you have a C++ class like:

class Foo {
 public:
  virtual ~Foo() {}
  virtual DoSomething() = 0;
};

The C++ compiler translates a call into a vtable lookup:

Foo* foo;

// Translated by C++ to:
//   foo->vtable->DoSomething(foo);
foo->DoSomething();

Suppose I was writing a JIT compiler and I wanted to obtain the address of the DoSomething() function for a particular instance of class Foo, so I can generate code that jumps to it directly instead of doing a table lookup and an indirect branch.

My questions are:

  1. Is there any standard C++ way to do this (I'm almost sure the answer is no, but wanted to ask for the sake of completeness).

  2. Is there any remotely compiler-independent way of doing this, like a library someone has implemented that provides an API for accessing a vtable?

I'm open completely to hacks, if they will work. For example, if I created my own derived class and could determine the address of its DoSomething method, I could assume that the vtable is the first (hidden) member of Foo and search through its vtable until I find my pointer value. However, I don't know a way of getting this address: if I write &DerivedFoo::DoSomething I get a pointer-to-member, which is something totally different.

Maybe I could turn the pointer-to-member into the vtable offset. When I compile the following:

class Foo {
 public:
  virtual ~Foo() {}
  virtual void DoSomething() = 0;
};

void foo(Foo *f, void (Foo::*member)()) {
  (f->*member)();
}

On GCC/x86-64, I get this assembly output:

Disassembly of section .text:

0000000000000000 <_Z3fooP3FooMS_FvvE>:
   0:   40 f6 c6 01             test   sil,0x1
   4:   48 89 74 24 e8          mov    QWORD PTR [rsp-0x18],rsi
   9:   48 89 54 24 f0          mov    QWORD PTR [rsp-0x10],rdx
   e:   74 10                   je     20 <_Z3fooP3FooMS_FvvE+0x20>
  10:   48 01 d7                add    rdi,rdx
  13:   48 8b 07                mov    rax,QWORD PTR [rdi]
  16:   48 8b 74 30 ff          mov    rsi,QWORD PTR [rax+rsi*1-0x1]
  1b:   ff e6                   jmp    rsi
  1d:   0f 1f 00                nop    DWORD PTR [rax]
  20:   48 01 d7                add    rdi,rdx
  23:   ff e6                   jmp    rsi

I don't fully understand what's going on here, but if I could reverse-engineer this or use an ABI spec I could generate a fragment like the above for each separate platform, as a way of obtaining a pointer out of a vtable.

like image 314
Josh Haberman Avatar asked Feb 24 '11 03:02

Josh Haberman


People also ask

What is a vtable pointer?

A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this vTable is simply a Function Pointer that points to the most-derived function accessible by that class ie the most Base Class.

Where is vtable memory stored?

Vtables themselves are generally stored in the static data segment, as they are class-specific (vs. object-specific).

What is stored in vtable?

For every class that contains virtual functions, the compiler constructs a virtual table, a.k.a vtable. The vtable contains an entry for each virtual function accessible by the class and stores a pointer to its definition. Only the most specific function definition callable by the class is stored in the vtable.

Why is vtable slow?

Virtual functions are slow when you have a cache miss looking them up. As we'll see through benchmarks, they can be very slow. They can also be very fast when used carefully — to the point where it's impossible to measure the overhead.


1 Answers

First, class types have a vtable. Instances of that type have a pointer to the vtable. This means that if the contents of the vtable change for a type all instances of that type are affected. But specific instance can have their vtable pointer changed.

There is no standard way to retrieve the vtable pointer from an instance because it is dependent upon the compiler's implementation. See this post for more details. However, G++ and MSVC++ seem to layout class objects as described on wikipedia. Classes can have pointers to multiple vtables. For the sake of simplicity I'll talk about classes that only have one vtable pointer.

To get the pointer of a function out of a vtable it can be done as simply as this:

int* cVtablePtr = (int*)((int*)c)[0];
void* doSomethingPtr = (void*)cVtablePtr[1];

Where c is an instance of class C for this class definition:

class A
{
public:
    virtual void A1() { cout << "A->A1" << endl; }
    virtual void DoSomething() { cout << "DoSomething" << endl; };
};

class C : public A
{
public:  
    virtual void A1() { cout << "C->A1" << endl; }
    virtual void C1() { cout << "C->C1" << endl; }
};

The class C is just a struct whose first member is the pointer to a vtable in this case.

In the case of a JIT compiler it might be possible to cache the lookup in the vtable by regenrating code.

At first the JIT compiler might produce this:

void* func_ptr = obj_instance[vtable_offest][function_offset];
func_ptr(this, param1, param2)

Now that the func_ptr is known the JIT can kill off that old code and simply hard code that function address into the compiled code:

hardcoded_func_ptr(this, param1, param2)

One thing I should note is while you can overwrite the instances vtable pointer it is not always possible to overwrite the contents of a vtable. For example, on Windows the vtable is marked as read only memory but on OS X it is read/write. So on windows trying to change the contents of the vtable will result in an Access Violation unless you change the page access with VirtualProtect.

like image 100
Evan Avatar answered Oct 16 '22 08:10

Evan