Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization of virtual table lookups

With code like below, can a compiler tell that a is in fact an instance of B and optimize away the virtual table lookup?

#include <iostream>

class A
{
  public:
    virtual void f()
    {
        std::cout << "A::f()" << std::endl;
    }
};

class B : public A
{
  public:
    void f()
    {
        std::cout << "B::f()" << std::endl;
    }
};

int main()
{
    B b;
    A* a = &b;
    a->f();

    return 0;
}

Additional question after the answers of Jonthan Seng and reima: In case gcc is used, would it be necessary to use any flags to force it to optimize the vtable lookup?

like image 526
Ruup Avatar asked Sep 13 '12 16:09

Ruup


People also ask

Why is vtable slow?

Virtual functions are slow when you have a cache miss looking them up. As we'll see through benchmarks, they can be very slow. They can also be very fast when used carefully — to the point where it's impossible to measure the overhead.

Why do we need Vtables?

At compile time, the compiler can't know which code is going to be executed by the o->f() call since it doesn't know what o points to. Hence, you need something called a "virtual table" which is basically a table of function pointers.

What is use of vtable in in inheritance?

You can imagine what happens when you perform inheritance and override some of the virtual functions. The compiler creates a new VTABLE for your new class, and it inserts your new function addresses using the base-class function addresses for any virtual functions you don't override.

How slow are virtual functions Really?

There is definitely more involved in calling a virtual function vs calling a function with a compile-time known address. In the case of a short and fast function, calling using the virtual dispatch mechanism was 18% slower. For a long and slow function, the difference was much lower, less than 1%.


1 Answers

Clang can easily make this optimization, and even inlines the function call. This can be seen from the generated assembly:

Dump of assembler code for function main():
   0x0000000000400500 <+0>: push   %rbp
   0x0000000000400501 <+1>: mov    %rsp,%rbp
   0x0000000000400504 <+4>: mov    $0x40060c,%edi
   0x0000000000400509 <+9>: xor    %al,%al
   0x000000000040050b <+11>:  callq  0x4003f0 <printf@plt>
   0x0000000000400510 <+16>:  xor    %eax,%eax
   0x0000000000400512 <+18>:  pop    %rbp
   0x0000000000400513 <+19>:  retq   

I took the liberty of replacing std::cout << … by equivalent calls to printf, as this greatly reduces the clutter in the disassembly.

GCC 4.6 can also deduce that no vtable lookup is needed, but does not inline:

Dump of assembler code for function main():
   0x0000000000400560 <+0>: sub    $0x18,%rsp
   0x0000000000400564 <+4>: mov    %rsp,%rdi
   0x0000000000400567 <+7>: movq   $0x4007c0,(%rsp)
   0x000000000040056f <+15>:  callq  0x400680 <B::f()>
   0x0000000000400574 <+20>:  xor    %eax,%eax
   0x0000000000400576 <+22>:  add    $0x18,%rsp
   0x000000000040057a <+26>:  retq   
like image 100
reima Avatar answered Sep 25 '22 01:09

reima