Non-virtual interface? (Need a very performant low level abstraction)

Question

I'm trying to micro-optimize my code at a very low level point in the application architecture. So here is my concrete scenario:

I have a parser class which parses a graph file (nodes, edges, adjacency entries etc.)
The file format is versioned, so there exist parsers for each version which are implemented as separate classes (ParserV1, ParserV2, ...).
The parsers provide the same functionality to some upper layer in the application. Thus, they implement the same "interface".
In C++, I'd implement such an interface as an abstract class with all functions being pure virtual.
As virtual functions need another memory lookup and can't be bound statically at compile time and -- much more important -- will not allow inlining of small methods in the parser classes, using a classical sub-classing idiom wouldn't lead to the best performance I can achieve.

[Before describing my possible solutions, I want to explain why I'm doing micro-optimization here (you may skip this paragraph): The parser class has a lot of small methods, where "small" means that they don't do much. Most of them only read one or two bytes or even only one bit from a cached bit stream. So it should be possible to implement them in a very very efficient way, where a function call, when inlined, only needs a handful of machine commands. The methods are called very often in the application, since they look up node attributes in a very big graph (the world-wide road network), which might happen about one million times per user request, and such an request should be as fast as possible.]

Which is the way to go here? I can see the following methods to solve the problem:

Write an interface with pure virtual methods and subclass it. The performance will suffer.
Do not write such an interface. Each parser defines the same methods on its own. In the upper layer (which uses the parser) has pointers (as members) to each version subclass. In the beginning, instantiate the specific parser which should be used. Use a switch block and cast the parser instance to the explicit subclass whenever accessing a function. Will the performance better? (if/switch block vs. virtual table lookup).
Mix the two solutions 1. + 2.: Write an interface with pure virtual methods for seldom used methods, where performance isn't highly critical. If it is critical, don't provide a virtual method but use the second method.
Improving 2.: Provide non-virtual methods in the abstract class; keep a version number as a member variable in the abstract class (kind of own runtime type information) and implement the if/switch blocks and casts in these methods; then call the methods in the subclass. This provide both inlining and static binding.

Are there better ways to solve this problem? Is there any idiom for this?

To clarify, I have a lot of functions which are version-independent (at least until now), and are thus perfectly fitting in some super class. I will use a standard sub-classing design for most functions, while this questions only covers a solution for the version-dependent functions to be optimised. (Some of them aren't called very frequently and I can of course use virtual methods in these cases.) Besides this, I don't like the idea to make the parser class decide which methods need to be performant and which don't. (Although it would be possible to do so.)

templatetypedef · Accepted Answer

One option that might work well is the following: have each parser class define methods with the same signatures, but does so completely independently of each other class. Then, introduce a secondary class hierarchy that implements all of these same functions virtually, then forwards each method call to a concrete parser object. That way, the implementation of the parser gets all the benefits of inlining, since from the perspective of the class all calls can be resolved statically, while the client gets the benefits of polymorphism, since any method call will dynamically resolve to the proper type.

The catch in doing this is that you use extra memory (the wrapper object takes up space), and you will also probably have at least one extra indirection involved when you call the parser functions, since the call goes

client → wrapper → implementation

Depending on how infrequently you call the methods from the client, this implementation might work very well.

Using templates, it's possible to implement the wrapper layer extremely succinctly. The idea is the following. Suppose that you have methods fA, fB, and fC. Start off by defining a base class like this:

class WrapperBase {
public:
    virtual ~WrapperBase() = 0;

    virtual void fA() = 0;
    virtual void fB() = 0;
    virtual void fC() = 0;
};

Now, define the following template type as a subclass:

template <typename Implementation>
    class WrapperDerived: public WrapperBase {
private:
    Implementation impl;

public:
    virtual void fA() {
        impl.fA();
    }
    virtual void fB() {
        impl.fB();
    }
    virtual void fC() {
        impl.fC();
    }
};

Now, you can do something like this:

WrapperBase* wrapper = new WrapperDerived<MyFirstImplementation>();
wrapper->fA();
delete wrapper;

wrapper = new WrapperDerived<MySecondImplementation>();
wrapper->fB();
delete wrapper;

In other words, all of the wrapper code can be generated for you by the compiler by just instantiating the WrapperDerived template.

Hope this helps!

valdo · Answer

First, of couse, you should profile your code to figure-out how much are the vcalls performance-killing in your particular case (besides of potentially weaker optimizations).

Putting the optimization subject aside, I'm almost sure you won't get any significant performance gain by replacing virtual function call (or call a function by a pointer variable, which is almost the same) with a switch that calls compile-time-known functions in different cases.

If you really want a significant improvement - those are the most promising variants IMHO:

Try to redesign your interface to enable more complex functions. For instance, if you have a function that reads a single vertex - modify it to read (up to) N vertexes at once. And so on.
You may make your whole parsing code (that uses your parser) a template class/function, that will use a template parameter to instantiate the needed parser. Here you'll need neither interface nor virtual functions. At the very beginning (where you identify the version) - put a switch, for every recognized version call this function with the appropriate template parameter.

The latter will probably be superior from the performance point of view, OTOH this increases the code size

EDIT:

Here's an example of (2):

template <class Parser>
void MyApplication::HandleSomeRequest(Parser& p)
{
    int n = p.GetVertexCount();
    for (iVertex = 0; iVertex < n; iVertex++)
    {
        // ...    
        p.GetVertexEdges(iVertex, /* ... */);    
        // ...    
    }
}

void MyApplication::HandleSomeRequest(/* .. */)
{
    int iVersion = /* ... */;
    switch (iVersion)
    {
    case 1:
        {
            ParserV1 p(/* ... */);
            HandleSomeRequest(p);
        }
        break;

    case 2:
        {
            ParserV2 p(/* ... */);
            HandleSomeRequest(p);
        }
        break;

    // ...
    }
}

The classes ParserV1, ParserV2 and etc. do not have virtual functions. They also don't inherit any interface. They just implement some functions, such as GetVertexCount.

Non-virtual interface? (Need a very performant low level abstraction)

Tags:

c++

idioms

architecture

abstraction

micro-optimization

leemes

2 Answers

templatetypedef

valdo

Recent Activity

Donate For Us

Non-virtual interface? (Need a very performant low level abstraction)

Tags:

c++

idioms

architecture

abstraction

micro-optimization

leemes

2 Answers

templatetypedef

valdo

Related questions

Recent Activity

Donate For Us