With code like below, can a compiler tell that a
is in fact an instance of B
and optimize away the virtual table lookup?
#include <iostream>
class A
{
public:
virtual void f()
{
std::cout << "A::f()" << std::endl;
}
};
class B : public A
{
public:
void f()
{
std::cout << "B::f()" << std::endl;
}
};
int main()
{
B b;
A* a = &b;
a->f();
return 0;
}
Additional question after the answers of Jonthan Seng and reima: In case gcc is used, would it be necessary to use any flags to force it to optimize the vtable lookup?
Virtual functions are slow when you have a cache miss looking them up. As we'll see through benchmarks, they can be very slow. They can also be very fast when used carefully — to the point where it's impossible to measure the overhead.
At compile time, the compiler can't know which code is going to be executed by the o->f() call since it doesn't know what o points to. Hence, you need something called a "virtual table" which is basically a table of function pointers.
You can imagine what happens when you perform inheritance and override some of the virtual functions. The compiler creates a new VTABLE for your new class, and it inserts your new function addresses using the base-class function addresses for any virtual functions you don't override.
There is definitely more involved in calling a virtual function vs calling a function with a compile-time known address. In the case of a short and fast function, calling using the virtual dispatch mechanism was 18% slower. For a long and slow function, the difference was much lower, less than 1%.
Clang can easily make this optimization, and even inlines the function call. This can be seen from the generated assembly:
Dump of assembler code for function main():
0x0000000000400500 <+0>: push %rbp
0x0000000000400501 <+1>: mov %rsp,%rbp
0x0000000000400504 <+4>: mov $0x40060c,%edi
0x0000000000400509 <+9>: xor %al,%al
0x000000000040050b <+11>: callq 0x4003f0 <printf@plt>
0x0000000000400510 <+16>: xor %eax,%eax
0x0000000000400512 <+18>: pop %rbp
0x0000000000400513 <+19>: retq
I took the liberty of replacing std::cout << …
by equivalent calls to printf
, as this greatly reduces the clutter in the disassembly.
GCC 4.6 can also deduce that no vtable lookup is needed, but does not inline:
Dump of assembler code for function main():
0x0000000000400560 <+0>: sub $0x18,%rsp
0x0000000000400564 <+4>: mov %rsp,%rdi
0x0000000000400567 <+7>: movq $0x4007c0,(%rsp)
0x000000000040056f <+15>: callq 0x400680 <B::f()>
0x0000000000400574 <+20>: xor %eax,%eax
0x0000000000400576 <+22>: add $0x18,%rsp
0x000000000040057a <+26>: retq
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With