Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High-performance code in c++ (inheritance, pointers to functions, if)

Suppose you have a very large graph with lots of processing upon its nodes (like tens of milions of operations per node). The core routine is the same for each node, but there are some additional operations based on internal conditions. There can be 2 such conditions which produces 4 cases (0,0), (1,0), (0,1), (1,1). E.g. (1,1) means that both conditions hold. Conditions are established once (one set for each node independently) in a program and, from then on, never change. Unfortunately, they are determined in runtime and in a fully unpredictable way (based on data received via HTTP from and external server).

What is the fastest in such scenario? (taken into account modern compiler optimizations which I have no idea of)

  • simply using "IFs": if (condition X) perform additional operation X.
  • using inheritance to derive four classes from base class (exposing method OPERATION) to have a proper operation and save tens milions of "ifs". [but I am not sure if this is a real saving, inheritance must have its overhead too)
  • use pointer to function to assign the function based on conditions once.

I would take me long to come to a point where I can test it by myself (I don't have such a big data yet and this will be integrated into bigger project so would not be easy to test all versions).

Reading answers: I know that I probably have to experiment with it. But apart from everything, this is sort of a question what is faster:

tens of milions of IF statements and normal statically known function calls VS function pointer calls VS inheritance which I think is not the best idea in this case and I am thinking of eliminating it from further inspection Thanks for any constructive answers (not saying that I shouldn't care about such minor things ;)

like image 648
user1314926 Avatar asked Nov 14 '13 13:11

user1314926


2 Answers

There is no real answer except to measure the actual code on the real data. At times, in the past, I've had to deal with such problems, and in the cases I've actually measured, virtual functions were faster than if's. But that doesn't mean much, since the cases I measured were in a different program (and thus a different context) than yours. For example, a virtual function call will generally prevent inlining, whereas an if is inline by nature, and inlining may open up additional optimization possibilities.

Also the machines I measured on handled virtual functions pretty well; I've heard that some other machines (HP's PA, for example) are very ineffective in their implementation of indirect jumps (including not only virtual function calls, but also the return from the function---again, the lost opportunity to inline costs).

like image 55
James Kanze Avatar answered Sep 26 '22 15:09

James Kanze


If you absolutely have to have the fastest way, and the process order of the nodes is not relevant, make four different types, one for each case, and define a process function for each. Then in a container class, have four vectors, one for each node type. Upon creation of a new node get all the data you need to create the node, including the conditions, and create a node of the correct type and push it into the correct vector. Then, when you need to process all your nodes, process them type by type, first processing all nodes in the first vector, then the second, etc.

Why would you want to do this:

  • No ifs for the state switching
  • No vtables
  • No function indirection

But much more importantly:

  • No instruction cache thrashing (you're not jumping to a different part of your code for every next node)
  • No branch prediction misses for state switching ifs (since there are none)

Even if you'd have inheritance with virtual functions and thus function indirection through vtables, simply sorting the nodes by their type in your vector may already make a world of difference in performance as any possible instruction cache thrashing would essentially be gone and depending on the methods of branch prediction the branch prediction misses could also be reduced.

Also, don't make a vector of pointers, but make a vector of objects. If they are pointers you have an extra adressing indirection, which in itself is not that worrisome, but the problem is that it may lead to data cache thrashing if the objects are pretty much randomly spread throughout your memory. If on the other hand your objects are directly put into the vector the processing will basically go through memory linearly and the cache will basically hit every time and cache prefetching might actually be able to do a good job.

Note though that you would pay heavily in data structure creation if you don't do it correctly, if at all possible, when making the vector reserve enough capacity in it immediately for all your nodes, reallocating and moving every time your vector runs out of space can become expensive.

Oh, and yes, as James mentioned, always, always measure! What you think may be the fastest way may not be, sometimes things are very counter intuitive, depending on all kinds of factors like optimizations, pipelining, branch prediction, cache hits/misses, data structure layout, etc. What I wrote above is a pretty general approach, but it is not guaranteed to be the fastest and there are definitely ways to do it wrong. Measure, Measure, Measure.

P.S. inheritance with virtual functions is roughly equivalent to using function pointers. Virtual functions are usually implemented by a vtable at the head of the class, which is basically just a table of function pointers to the implementation of the given virtual for the actual type of the object. Whether ifs is faster than virtuals or the other way around is a very very difficult question to answer and depends completely on the implementation, compiler and platform used.

like image 28
wich Avatar answered Sep 23 '22 15:09

wich