Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the performance cost of having a virtual method in a C++ class?

People also ask

Are virtual functions expensive?

On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call. So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful.

Are virtual functions bad for performance?

The use of virtual functions will have a very slight effect on performance, but it's unlikely to affect the overall performance of your application.

What is the value of using virtual method within C#?

C# virtual method is a method that can be redefined in derived classes. In C#, a virtual method has an implementation in a base class as well as derived the class. It is used when a method's basic functionality is the same but sometimes more functionality is needed in the derived class.

Are virtual methods slower?

Virtual functions are by definition slower than their non-virtual counterparts; we wanted to measure the performance gains from inlining; using virtual functions would make the difference even more pronounced.


I ran some timings on a 3ghz in-order PowerPC processor. On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call.

So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful. A 7ns overhead on a function that inlines to 0.5ns is severe; a 7ns overhead on a function that takes 500ms to execute is meaningless.

The big cost of virtual functions isn't really the lookup of a function pointer in the vtable (that's usually just a single cycle), but that the indirect jump usually cannot be branch-predicted. This can cause a large pipeline bubble as the processor cannot fetch any instructions until the indirect jump (the call through the function pointer) has retired and a new instruction pointer computed. So, the cost of a virtual function call is much bigger than it might seem from looking at the assembly... but still only 7 nanoseconds.

Edit: Andrew, Not Sure, and others also raise the very good point that a virtual function call may cause an instruction cache miss: if you jump to a code address that is not in cache then the whole program comes to a dead halt while the instructions are fetched from main memory. This is always a significant stall: on Xenon, about 650 cycles (by my tests).

However this isn't a problem specific to virtual functions because even a direct function call will cause a miss if you jump to instructions that aren't in cache. What matters is whether the function has been run before recently (making it more likely to be in cache), and whether your architecture can predict static (not virtual) branches and fetch those instructions into cache ahead of time. My PPC does not, but maybe Intel's most recent hardware does.

My timings control for the influence of icache misses on execution (deliberately, since I was trying to examine the CPU pipeline in isolation), so they discount that cost.


There is definitely measurable overhead when calling a virtual function - the call must use the vtable to resolve the address of the function for that type of object. The extra instructions are the least of your worries. Not only do vtables prevent many potential compiler optimizations (since the type is polymorphic the compiler) they can also thrash your I-Cache.

Of course whether these penalties are significant or not depends on your application, how often those code paths are executed, and your inheritance patterns.

In my opinion though, having everything as virtual by default is a blanket solution to a problem you could solve in other ways.

Perhaps you could look at how classes are designed/documented/written. Generally the header for a class should make quite clear which functions can be overridden by derived classes and how they are called. Having programmers write this documentation is helpful in ensuring they are marked correctly as virtual.

I would also say that declaring every function as virtual could lead to more bugs than just forgetting to mark something as virtual. If all functions are virtual everything can be replaced by base classes - public, protected, private - everything becomes fair game. By accident or intention subclasses could then change the behavior of functions that then cause problems when used in the base implementation.


It depends. :) (Had you expected anything else?)

Once a class gets a virtual function, it can no longer be a POD datatype, (it may not have been one before either, in which case this won't make a difference) and that makes a whole range of optimizations impossible.

std::copy() on plain POD types can resort to a simple memcpy routine, but non-POD types have to be handled more carefully.

Construction becomes a lot slower because the vtable has to be initialized. In the worst case, the difference in performance between POD and non-POD datatypes can be significant.

In the worst case, you may see 5x slower execution (that number is taken from a university project I did recently to reimplement a few standard library classes. Our container took roughly 5x as long to construct as soon as the data type it stored got a vtable)

Of course, in most cases, you're unlikely to see any measurable performance difference, this is simply to point out that in some border cases, it can be costly.

However, performance shouldn't be your primary consideration here. Making everything virtual is not a perfect solution for other reasons.

Allowing everything to be overridden in derived classes makes it much harder to maintain class invariants. How does a class guarantee that it stays in a consistent state when any one of its methods could be redefined at any time?

Making everything virtual may eliminate a few potential bugs, but it also introduces new ones.


If you need the functionality of virtual dispatch, you have to pay the price. The advantage of C++ is that you can use a very efficient implementation of virtual dispatch provided by the compiler, rather than a possibly inefficient version you implement yourself.

However, lumbering yourself with the overhead if you don't needx it is possibly going a bit too far. And most classesare not designed to be inherited from - to create a good base class requires more than making its functions virtual.


Virtual dispatch is an order of magnitude slower than some alternatives - not due to indirection so much as the prevention of inlining. Below, I illustrate that by contrasting virtual dispatch with an implementation embedding a "type(-identifying) number" in the objects and using a switch statement to select the type-specific code. This avoids function call overhead completely - just doing a local jump. There is a potential cost to maintainability, recompilation dependencies etc through the forced localisation (in the switch) of the type-specific functionality.


IMPLEMENTATION

#include <iostream>
#include <vector>

// virtual dispatch model...

struct Base
{
    virtual int f() const { return 1; }
};

struct Derived : Base
{
    virtual int f() const { return 2; }
};

// alternative: member variable encodes runtime type...

struct Type
{
    Type(int type) : type_(type) { }
    int type_;
};

struct A : Type
{
    A() : Type(1) { }
    int f() const { return 1; }
};

struct B : Type
{
    B() : Type(2) { }
    int f() const { return 2; }
};

struct Timer
{
    Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
    struct timespec from;
    double elapsed() const
    {
        struct timespec to;
        clock_gettime(CLOCK_MONOTONIC, &to);
        return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
    }
};

int main(int argc)
{
  for (int j = 0; j < 3; ++j)
  {
    typedef std::vector<Base*> V;
    V v;

    for (int i = 0; i < 1000; ++i)
        v.push_back(i % 2 ? new Base : (Base*)new Derived);

    int total = 0;

    Timer tv;

    for (int i = 0; i < 100000; ++i)
        for (V::const_iterator i = v.begin(); i != v.end(); ++i)
            total += (*i)->f();

    double tve = tv.elapsed();

    std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';

    // ----------------------------

    typedef std::vector<Type*> W;
    W w;

    for (int i = 0; i < 1000; ++i)
        w.push_back(i % 2 ? (Type*)new A : (Type*)new B);

    total = 0;

    Timer tw;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
        {
            if ((*i)->type_ == 1)
                total += ((A*)(*i))->f();
            else
                total += ((B*)(*i))->f();
        }

    double twe = tw.elapsed();

    std::cout << "switched: " << total << ' ' << twe << '\n';

    // ----------------------------

    total = 0;

    Timer tw2;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
            total += (*i)->type_;

    double tw2e = tw2.elapsed();

    std::cout << "overheads: " << total << ' ' << tw2e << '\n';
  }
}

PERFORMANCE RESULTS

On my Linux system:

~/dev  g++ -O2 -o vdt vdt.cc -lrt
~/dev  ./vdt                     
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726

This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast. Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.


COMMENTS RE VIRTUAL DISPATCH

It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).