Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ idiom for base class abstract methods without dynamic dispatch overhead?

Tags:

c++

c++11

In C++, is there any way to have an "abstract" base class method (i.e., declared and callable from the base class, but implemented in subclasses) without declaring the method as virtual?

This question, of course, only applies to cases where polymorphism isn't needed (pointers/references to base types never used). Consider the following:

#define NO_OPT asm volatile ("");  // to prevent some compiler optimization

template<typename DerivedType>
void doSomething(DerivedType& d) { d.foo(); }

namespace test1 {

    struct Base {
        inline void foo()
        {
            // ... do common stuff pre-call ...
            foo_impl();
            // ... do common stuff post-call ...
        }
        virtual void foo_impl() = 0; // the abstract method
    };
    struct D1 : public Base { virtual void foo_impl() final { NO_OPT } };
    struct D2 : public Base { virtual void foo_impl() final { NO_OPT } };

    // Then the usage of D1, D2, ..., DN, could be as follows:

    void go() {

        D1 d1; doSomething(d1);
        D2 d2; doSomething(d2);

        for ( long i=0; i < 5000000000; i++ ) {
            // this loop takes [9] seconds
            doSomething(d2);
        }
    }
}

Note that polymorphism is not needed in this case, and that there are plenty of optimization opportunities for the compiler.

However, I benchmarked this code in the latest g++ (4.8.2) and clang (3.4) with -O3 optimizations enabled, including link-time (LTO), and it was significantly slower than the following alternative implementation (using templates instead of virtual method):

namespace test2 {

    template<typename DerivedType>
    struct Base : public DerivedType  // inheritance in reverse
    {
        inline void foo()
        {
            // ... do common stuff pre-call ...
            DerivedType::foo_impl();
            // ... do common stuff post-call ...
        }
    };
    struct D1 { void foo_impl() { NO_OPT } };
    struct D2 { void foo_impl() { NO_OPT } };

    void go() {
        Base<D1> d1; doSomething(d1);
        Base<D2> d2; doSomething(d2);

        for ( long i=0; i < 5000000000; i++ ) {
            // this loop takes [3] seconds
            doSomething(d2);
        }
    }
}

g++ and clang were remarkably consistent, each compiling optimized code that took 9 seconds to execute the test1 loop, but only 3 seconds to execute the test2 loop. So even though the test1 logic has no need for dynamic dispatch as all calls should be able to be resolved at compile time, it is still significantly slower.

So to restate my question: When polymorphism isn't needed, Is there a way to achieve this "abstract method" behavior using the convenient straight-forward class inheritance (like in test1), but without the performance penalty of virtual functions (as achieved in test2)?

like image 361
etherice Avatar asked Jan 13 '23 15:01

etherice


2 Answers

Why should explicitly invoking runtime support for a situation you don't have and don't want to have be considered straightforward? That'll confuse humans, too. Templates are the perfect tool for this job.

You want a common interface. Given your example template implementation, you don't even need any obvious relation between the types that have it, just that they do have it.

The classic CRTP way is

template<class d>
struct b { 
    void foo() { bark(); static_cast<d*>(this)->foo_impl(); bite(); }
};

struct d1 : b<d1> { void foo_impl() { fuzzbang("d1"); } };
struct d2 : b<d2> { void foo_impl() { fizzbuzz("d2"); } };

which people will have seen before, it's idiomatic as requested, but your

class m1 { protected: void foo_impl() { f1("1"); } };
class m2 { protected: void foo_impl() { f2("2"); } };

template<class m>
struct i : private m { void foo() { bark(); m::foo_impl(); bite(); } };

has a lot to recommend it too: there's no casting necessary, and the intent is as clear.


I don't think you can legitimately compare optimizer results on code with embedded volatile accesses with results on similar code without them. Try the two implementations in your actual code. Performance for anything you're doing five billion times is almost certain to be dominated by cache effects not instruction count anyway.

like image 117
jthill Avatar answered Jan 22 '23 09:01

jthill


If a function has a polymorphic value (a reference or pointer to a polymorphic type), and it calls a virtual function through that variable, the compiler must assume that this could be that class or any class derived from it. And since polymorphism doesn't allow the compiler to know whether there are classes derived from it, it must use a virtual dispatch to call the function.

There is only one way to avoid the use of virtual dispatch on calling a virtual function via a polymorphic value: to call the specific type's function directly, exactly as you did in your case #2.

Where you put this call is up to you. You could put the call into foo, you could have a derived_foo alternate function that people use to call the derived-class version, etc. However you want to do it, you must ultimately call the virtual function in a different way. Which requires using a different codepath.

It is possible that, if provided with a polymorphic variable of a final class, a compiler could optimize all virtual calls into that variable into non-virtual dispatches. However, that is not required behavior by the standard. Each compiler may or may not implement that optimization, so unless it becomes widespread, you can't count on it.

This question seems very much like an XY problem. You're trying to solve some problem, and you settled on a base class with a virtual function, with a derived class that you'll always be using. This problem is best solved via the CRTP, which doesn't use virtual functions at all.

like image 31
Nicol Bolas Avatar answered Jan 22 '23 11:01

Nicol Bolas