c++ virtual table lookup - how does it search & replace

Question

Lets take below example:

class Base{
    virtual string function1(){ return "Base - function1"; };
    virtual string function2(){ return "Base - function2"; };
};

class Derived : public Base {
    virtual string function2(){ return "Derived - function2"; };
    virtual string function1(){ return "Derived - function1"; };
    string function3() { return "Derived - function3"; };
};

So the vtable structure is like

Base-vTable
-----------------------
name_of_function address_of_function
function1   &function1
function2   &function2
-----------------------
-----------------------
Derived-vTable
-----------------------
name_of_function address_of_function
function1   &function1
function2   &function2

or is it like

    Base-vTable
-----------------------
    Offset function
    +0  function1
    +4  function2
-----------------------
-----------------------
    Derived-vTable
-----------------------
    Offset function
    +0  function1
    +4  function2

If it is like latter? then what is that offset? where is it used?

And the function name: Is it mangled function name? if it is mangled then the base and derived mangled names wont match and the vtable lookup wont work. Compiler does mangle all the virtual function names so it must be a mangled name, does it mean that the mangled name for base & derived is same in case it is virtual function.

avakar · Accepted Answer

Virtual tables are merely arrays of function pointers, just like your second snippet. The compiler translates calls to virtual functions to calls through a pointer, for example

Base * b = /* ... */;
b->function2();

gets translated to

b->__vtable[1]();

where I used the name __vtable to refer to the virtual table (note however that the virtual table is typically not accessible directly).

The order of entries in the table is determined by the order in which the functions are declared in the class. Remember that the class definition is always available at the point of call.

user966379 · Answer

I am explaining the following code. I thik it will make you clear

  Base *p = new Derived;
  p->function2();

At compile time, VTable is created, VTable of class Base is identical with VTable of Derived class,. I mean both have 2 functions as you have mentioned in first case. Compiler inserts code to initialise vptr of the right object.

When compiler see the statement p->function2();, It does not do any binding to the called function as t only knows about Base object. From VTable of class Base it comes to know the position of function2 ( Here is 2nd position in VTable) .

At run time, the VTable of class Dervied is assigned to vptr. Function in 2nd postion of VTable is invoked.

Matthieu M. · Answer

The simplest way to clear this is to look up an actual implementation.

Consider the following code:

struct Base { virtual void foo() = 0; };

struct Derived { virtual void foo() { } };

Base& base();

void bar() {
  Base& b = base();
  b.foo();           // virtual call
}

And now, feed this to the Try Out page of Clang to obtain LLVM IR:

; ModuleID = '/tmp/webcompile/_6336_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

%struct.Base = type { i32 (...)** }

define void @_Z3barv() {
  %1 = tail call %struct.Base* @_Z4basev()
  %2 = bitcast %struct.Base* %1 to void (%struct.Base*)***
  %3 = load void (%struct.Base*)*** %2, align 8
  %4 = load void (%struct.Base*)** %3, align 8
  tail call void %4(%struct.Base* %1)
  ret void
}

declare %struct.Base* @_Z4basev()

Since I suppose you might not know about the IR yet, let's review it piece by piece.

First come some stuff that you ought not worry about. It identifies the architecture (processor and system) for which this is compiled, along with its properties.

; ModuleID = '/tmp/webcompile/_6336_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

Then, LLVM is taught about the types:

%struct.Base = type { i32 (...)** }

It analyzes the types structurally. So here we only get that Base will be composed of a single element i32 (...)**: this is actually the "infamous" v-table pointer. Why this weird type ? Because we will store in the v-table a lot of function pointers of different types. This means that we would have an heterogeneous array (which is not possible), so instead we treat it as an array of "generic" unknown elements (to mark that we are ensure of what is there) and it's up to the application to cast the pointer to the appropriate function pointer type before actually using it (or rather, it would be if we were in C or C++, the IR is much lower level).

Jumping to the end:

declare %struct.Base* @_Z4basev()

this declares a function (_Z4basev, the name is mangled) which returns a pointer to Base: in the IR references and pointers are both represented by pointers.

Okay, so let's see the definition of bar (or _Z3barv as it is mangled). This is where the interesting things lay:

  %1 = tail call %struct.Base* @_Z4basev()

A call to base, which returns a pointer to Base (the return type is always precised at the call site, much easier to analyze), this is stored in a constant called %1.

  %2 = bitcast %struct.Base* %1 to void (%struct.Base*)***

A weird bitcast, that transforms our Base* to a pointer to stranges things... In essence, we are obtaining the v-table pointer. It has not been "named" and we just ensured in the definition of the type that it was the first element.

  %3 = load void (%struct.Base*)*** %2, align 8
  %4 = load void (%struct.Base*)** %3, align 8

We first load the v-table (pointed to by %2) and then load the function pointer (pointed to by %3). At this moment, %4 is therefore &Derived::foo.

  tail call void %4(%struct.Base* %1)

Finally, we call the function, and we pass it the this element, made explicit here.

nimrodm · Answer

The second case -- asuming pointers take 4 bytes (32 bit machines).

Function names are never stored in the executable (Except for debugging). A virtual table is just a vector of function pointers, directly accessed by the running code.

paper.plane · Answer

When the a virtual function is added in the class the compiler creates a hidden pointer (called v-ptr) as a member of the class.[You can check it by taking the sizeof(class), which is increased by sizeof(pointer)] Also compiler internally adds some code at the beginning of the constructor to initialize the v-ptr to the base offset of the v-table of the class. Now when this class is derived by some other class then this v-ptr is also derived by the Derived class. And for Derived class this v-ptr is initialized to the base offset of the Derived class's v-table. And we already know that the v-tables of the respective classes will store the addresses of their versions of the virtual functions. [Note that if the virtual function is not overridden in the derived class then the address of the base-version or most derived-version(for multi-level inheritence) of the function in the hierarchy will be stored in the v-table]. Hence at the run-time it simply invokes the function via this v-ptr. So if the base class pointer stores a base object then the base version of the v-ptr comes into action. Since it is pointing to base version of the v-table so automatically base version of the function will be invoked. Same is the case for Derived object.

c++ virtual table lookup - how does it search & replace

Tags:

c++

virtual

harish

5 Answers

avakar

user966379

Matthieu M.

nimrodm

paper.plane

Recent Activity

Donate For Us

c++ virtual table lookup - how does it search & replace

Tags:

c++

virtual

harish

5 Answers

avakar

user966379

Matthieu M.

nimrodm

paper.plane

Related questions

Recent Activity

Donate For Us