I am writing some code for handling data. There are a number of groups of processing functions that can be chosen by the user that are then applied to the dataset. I would like to implement all these groups in separate places, but since they all take the same parameters and all do similar things I would like for them to have a common interface.
Being a good little c++ programmer my first thought was to simply use polymorphism. Just create some abstract class with the desired interface and then derive each set of processing objects from that. My hopes were quickly dashed however when I thought of another wrinkle. These datasets are enormous, resulting in the functions in question being called literally billions of times. While dynamic lookup is fairly cheap, as I understand it, it is a good deal slower than a standard function call.
My current idea to combat this is to use function pointers, in a manner something like this:
void dataProcessFunc1(mpz_class &input){...}
void dataProcessFunc2(mpz_class &input){...}
...
class DataProcessInterface
{
...
void (*func1)(mpz_class);
void (*func2)(mpz_class);
...
}
With some sort of constructor or something for setting up the pointers to point at the right things.
So I guess my question is this: Is this a good method? Is there another way? Or should I just learn to stop worrying and love the dynamic lookup?
A virtual function call is a function call via a pointer. The overhead is generally about the same as an explicit function call via a pointer. In other words, your idea is likely to gain very little (quite possibly nothing at all).
My immediate reaction would be to start with virtual functions, and only worry about something else when/if a profiler shows that the overhead of virtual calls is becoming significant.
When/if that occurs, another possibility would be to define the interface in a class template, then put the various implementations of that interface into specializations of the template. This normally eliminate all run-time overhead (though it's often a fair amount of extra work).
I don't agree with one answer above that says a template-based solution could have a worst overhead or run-time. In fact, template-based solutions allow to write faster code removing the need of virtual functions or call-by-pointer (I agree, though, that using these mechanism still does not impose a significant overhead.)
Suppose that you configure your processing interface using a series of "traits", that is, processing parts or functions that can be configured by a client to tune the processing interface. Imagine a class with three (to see an example) parameterizations of processing:
template <typename Proc1, Proc2 = do_nothing, Proc3 = do_nothing>
struct ProcessingInterface
{
static void process(mpz_class& element) {
Proc1::process(element);
Proc2::process(element);
Proc3::process(element);
}
};
If a client have different "processors" with an static function "process" that know how to process an element, you can write a class like this to "combine" those three processings. Note that the default do_nothing
class has an empty process method:
class do_nothing
{
public:
static void process(mpz_class&) {}
};
These calls have no overhead. They are normal calls, and a client can configure a processing using ProcessingInterface<Facet1, Facet2>::process(data);
.
This is only applicable if you know the different "facets" or "processors" at compile time, which seems to be the case with your first example.
Note also that you can write a more complicated class by using metaprogramming facilities such as the boost.mpl library, to include more classes, iterate through them, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With