Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the compiler generate code for virtual function calls?

Tags:

c++

enter image description here

CAT *p;
...
p->speak();
...

Some book said that the compiler will translate p->speak() to:

(*p->vptr[i])(p); //i is the idx of speak in the vtbl

My question is: since at compile time, it is impossible to know the real type of p, which means it is impossible to know which vptr or vtbl to be use. So, how does the compiler generate correct code?

[modified]

For example:

void foo(CAT* c)
{
    c->speak();
    //if c point to SmallCat
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x1234   
    //if c point to CAT
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x5678  

    //since ps,pc all are CAT*, why does compiler can generate different code for them 
    //in compiler time?
}

...
CAT *ps,*pc;
ps = new SmallCat;  //suppose SmallCat's vtbl address is 0x1234;
pc = new CAT;       //suppose CAT's vtbl address is 0x5678;
...
foo(ps);
foo(pc)
...

Any ideas? Thanks.

like image 685
camino Avatar asked Feb 04 '14 20:02

camino


People also ask

How does the compiler implement virtual functions?

This is called “dynamic binding.” Most compilers use some variant of the following technique: if the object has one or more virtual functions, the compiler puts a hidden pointer in the object called a “virtual-pointer” or “v-pointer.” This v-pointer points to a global table called the “virtual-table” or “v-table.”

How does the compiler resolve a call to a virtual function?

A call to a virtual function is resolved according to the underlying type of object for which it is called. A call to a nonvirtual function is resolved according to the type of the pointer or reference.

How are virtual functions implemented in C++?

To implement virtual functions, C++ uses a special form of late binding known as the virtual table. The virtual table is a lookup table of functions used to resolve function calls in a dynamic/late binding manner.

Which is used to create virtual function?

2. Which is used to create a pure virtual function? d) ! Explanation: For making a method as pure virtual function, We have to append '=0' to the class or method.


4 Answers

What your picture is missing is an arrow from a CAT and a SmallCAT objects to their corresponding vtbls. The compiler embeds a pointer to vtbl into the object itself - one can think of it as a hidden member variable. That is why it is said that adding the first virtual function "costs" you one pointer per object in memory footprint. The pointer to vtbl is set up by the code in the constructor, so all the compiler-generated virtual call needs to do in order to get to its vtable at runtime is dereferencing the pointer to this.

Of course this gets more complicated with virtual and multiple inheritance: the compiler needs to generate a slightly different code, but the basic process remains the same.

Here is your example explained in more details:

CAT *p1,*p2;
p1 = new SmallCat;  //suppose its vtbl address is 0x1234;
// The layout of SmallCat object includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x1234.
p2 = new CAT;       //suppose its vtbl address is 0x5678;
// The layout of Cat object also includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x5678.
(*p1->vptr[i])(p); //should use vtbl at 0x1234
// Compiler has enough information to do that, because it squirreled away 0x1234
// inside the SmallCat object at the time it was constructed.
(*p2->vptr[i])(p); //should use vtbl at 0x5678
// Same deal - the constructor saved 0x5678 inside the Cat, so we're good.
like image 59
Sergey Kalinichenko Avatar answered Oct 21 '22 23:10

Sergey Kalinichenko


which means it is impossible to know which vptr or vtbl to be use

That's correct during method invocation. But at construction time, the type of the constructed object is actually known, and the compiler will generate code in the ctor to initialize the vptr to point to the vtbl of the corresponding class. All the later virtual method invocations will call the method in the right vtbl via this vptr.

For more details on how exactly this initialization works with base objects (with multiple ctors being called in sequence), please refer to this answer to a similar question.

like image 41
user3146587 Avatar answered Oct 21 '22 23:10

user3146587


The compiler implicitly adds a pointer called vptr to every class that has one or more virtual functions.

You can tell this by using sizeof on such class, and see that it is larger than what you'd expect by 4 or 8 bytes, depending on the sizeof(void*).

The compiler also adds to the constructor of each class, an implicit piece of code which sets vptr to point to a table of function pointers (a.k.a. V-Table).

When an object is instantiated, its type is explicitly "mentioned".

For example: A a(1) or A* p = new B(2).

So inside the constructor, during runtime, vptr can be easily set to point to the correct V-Table.

In the example above:

  • The vptr of a is set to point to the V-Table of class A.

  • The vptr of p is set to point to the V-Table of class B.

BTW, the constructor is different from all other functions, in the fact that you have to explicitly use the object type in order to call it (hence a constructor can never be declared virtual).

Here is how the compiler generates the correct code for a virtual function p->speak():

CAT *p;
...
p = new SuperCat("SaberTooth",2); // p->vptr = SuperCat_Vtable
...
p->speak(); // See pseudo assembly code below

Ax = p               // Get the address of the instance
Bx = p->vptr         // Get the address of the instance's V-Table
Cx = Bx + CAT::speak // Add the number of the function in its class
Dx = *Cx             // Get the address of the appropriate function
Push Ax              // Push the address of the instance into the stack
Push Dx              // Push the address of the function into the stack
CallF                // Save some registers and jump to the beginning of the function

The compiler uses the same number (index) for all speak functions in the hierarchy of class CAT.

Here is how the compiler generates the correct code for a non-virtual function p->eat():

p->eat(); // See pseudo assembly code below

Ax = p        // Get the address of the instance
Bx = CAT::eat // Get the address of the function
Push Ax       // Push the address of the instance into the stack
Push Bx       // Push the address of the function into the stack
CallF         // Save some registers and jump to the beginning of the function

Since the address of the eat function is known at compile-time, the assembly code is more efficient.

And finally, here is how 'vptr' is set to point to the correct V-Table during runtime:

class SmallCat
{
    void* vptr; // implicitly added by the compiler
    ...         // your explicit variables
    SmallCat()
    {
        vptr = (void*)0x1234; // implicitly added by the compiler
        ...                   // Your explicit code
    }
};

When you instantiate CAT* p = new SmallCat(), a new object is created, with its vptr = 0x1234

like image 29
barak manos Avatar answered Oct 21 '22 23:10

barak manos


When you write this (I've replaced all usercode with lowercase):

class cat {
public:
    virtual void speak() {std::cout << "meow\n";}
    virtual void eat() {std::cout << "eat\n";}
    virtual void destructor() {std::cout << "destructor\n";}
};

The compiler generates all of this magically (All my sample compiler code is uppercase):

class cat;
struct CAT_VTABLE_TYPE { //here's the cat's vtable type
    void(*speak)(cat* this); //contains a pointer for each virtual function
    void(*eat)(cat* this);
    void(*destructor)(cat* this);
};
extern CAT_VTABLE_TYPE CAT_VTABLE; //later is a global shared copy of the vtable
class cat { //here's the class you typed
private:
    CAT_VTABLE_TYPE* vptr; //but the compiler adds this magic member
public:
    cat() :vptr(&CAT_VTABLE) {} //the compiler initializes the vtable ptr
    ~cat() {vptr->destructor(this);} //redirects to the one you coded
    void speak() {vptr->speak(this);} //redirects to the one you coded
    void eat() {vptr->eat(this);} //redirects to the one you coded
};

//Here's the functions you programmed
void DEFAULT_CAT_SPEAK(CAT* this) {std::cout << "meow\n";}
void DEFAULT_CAT_EAT(CAT* this) {std::cout << "eat\n";}
void DEFAULT_CAT_DESTRUCTOR(CAT* this) {std::cout << "destructor\n";}
//and the global cat vtable (shared by all cat objects)
const CAT_VTABLE_TYPE CAT_VTABLE = {
    DEFAULT_CAT_SPEAK, 
    DEFAULT_CAT_EAT, 
    DEFAULT_CAT_DESTRUCTOR};

Well, that's a lot isn't it? (I actually cheated slightly, since I take the address of an object before it's defined, but this way is less code and less confusing, even if technically uncompilable) You can see why they built it into the language. And... here's SmallCat before:

class smallcat : public cat {
public:
    virtual void speak() {std::cout << "meow2\n";}
    virtual void destructor() {std::cout << "destructor2\n";}
};

and after:

class smallcat;
//here's the smallcat's vtable type
struct SMALLCAT_VTABLE_TYPE : public CAT_VTABLE_TYPE { 
     //contains no additional virtual functions that cat didn't have
};
extern SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE; //later is a global shared copy of the vtable
class smallcat : public cat { //here's the class you typed
public:
    smallcat() :vptr(&SMALLCAT_VTABLE) {} //the compiler initializes the vtable ptr
    //The other functions already are virtual, nothing additional needed
};
//Here's the functions you programmed
void DEFAULT_SMALLCAT_SPEAK(CAT* this) {std::cout << "meow2\n";}
void DEFAULT_SMALLCAT_DESTRUCTOR(CAT* this) {std::cout << "destructor2\n";}
//and the global cat vtable (shared by all cat objects)
const SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE = {
    DEFAULT_SMALLCAT_SPEAK, 
    DEFAULT_CAT_EAT, //note: eat wasn't overridden
    DEFAULT_SMALLCAT_DESTRUCTOR};

So, if that's too much to read, the compiler makes a VTABLE object for each type, which points to the member functions for that particular type, and then it sticks a pointer to that VTABLE inside each instance.

When you create a smallcat object, the compiler constructs the cat parent object, which assigns the vptr to point at the CAT_VTABLE global. Immediately after, the compiler constructs the smallcat derived object, which overwrites the vptr member to make it point at the SMALLCAT_VTABLE global.

When you call c->speak();, the compiler produces calls it's copy of cat::speak, (which looks like this->vptr->speak(this);). The vptr member might be pointing at the global CAT_VTABLE or the global SMALLCAT_VTABLE, and that table's speak pointer is therefore pointing either at DEFAULT_CAT_SPEAK (what you put in cat::speak), or DEFAULT_SMALLCAT_SPEAK (the code you placed in smallcat::speak). So this->vptr->speak(this); ends up calling the function for the most derived type, no matter what the most derived type is.

All in all, it is admittedly very confusing, since the compiler is magically renaming functions at compile time. Actually, due to multiple inheritance, in reality it's far more confusing than I've shown here.

like image 42
Mooing Duck Avatar answered Oct 21 '22 22:10

Mooing Duck