I am creating a mechanism which allows users to form arbitrary complex functions from basic building blocks using the decorator pattern. This works fine functionality wise, but I don't like the fact that it involves a lot of virtual calls, particularly when the nesting depth becomes large. It worries me because the complex function may called often (>100.000 times).
To avoid this problem, I tried to turn the decorator scheme into a std::function
once it was finished (cfr. to_function()
in the SSCCE). All internal function calls are wired during construction of the std::function
. I figured this would be faster to evaluate than the original decorator scheme because no virtual lookups need to be performed in the std::function
version.
Alas, benchmarks prove me wrong: the decorator scheme is in fact faster than the std::function
I built from it. So now I am left wondering why. Maybe my test setup is faulty since I only use two trivial basic functions, which means the vtable lookups may be cached?
The code I used is included below, unfortunately it is quite long.
// sscce.cpp #include <iostream> #include <vector> #include <memory> #include <functional> #include <random> /** * Base class for Pipeline scheme (implemented via decorators) */ class Pipeline { protected: std::unique_ptr<Pipeline> wrappee; Pipeline(std::unique_ptr<Pipeline> wrap) :wrappee(std::move(wrap)){} Pipeline():wrappee(nullptr){} public: typedef std::function<double(double)> FnSig; double operator()(double input) const{ if(wrappee.get()) input=wrappee->operator()(input); return process(input); } virtual double process(double input) const=0; virtual ~Pipeline(){} // Returns a std::function which contains the entire Pipeline stack. virtual FnSig to_function() const=0; }; /** * CRTP for to_function(). */ template <class Derived> class Pipeline_CRTP : public Pipeline{ protected: Pipeline_CRTP(const Pipeline_CRTP<Derived> &o):Pipeline(o){} Pipeline_CRTP(std::unique_ptr<Pipeline> wrappee) :Pipeline(std::move(wrappee)){} Pipeline_CRTP():Pipeline(){}; public: typedef typename Pipeline::FnSig FnSig; FnSig to_function() const override{ if(Pipeline::wrappee.get()!=nullptr){ FnSig wrapfun = Pipeline::wrappee->to_function(); FnSig processfun = std::bind(&Derived::process, static_cast<const Derived*>(this), std::placeholders::_1); FnSig fun = [=](double input){ return processfun(wrapfun(input)); }; return std::move(fun); }else{ FnSig processfun = std::bind(&Derived::process, static_cast<const Derived*>(this), std::placeholders::_1); FnSig fun = [=](double input){ return processfun(input); }; return std::move(fun); } } virtual ~Pipeline_CRTP(){} }; /** * First concrete derived class: simple scaling. */ class Scale: public Pipeline_CRTP<Scale>{ private: double scale_; public: Scale(std::unique_ptr<Pipeline> wrap, double scale) // todo move :Pipeline_CRTP<Scale>(std::move(wrap)),scale_(scale){} Scale(double scale):Pipeline_CRTP<Scale>(),scale_(scale){} double process(double input) const override{ return input*scale_; } }; /** * Second concrete derived class: offset. */ class Offset: public Pipeline_CRTP<Offset>{ private: double offset_; public: Offset(std::unique_ptr<Pipeline> wrap, double offset) // todo move :Pipeline_CRTP<Offset>(std::move(wrap)),offset_(offset){} Offset(double offset):Pipeline_CRTP<Offset>(),offset_(offset){} double process(double input) const override{ return input+offset_; } }; int main(){ // used to make a random function / arguments // to prevent gcc from being overly clever std::default_random_engine generator; auto randint = std::bind(std::uniform_int_distribution<int>(0,1),std::ref(generator)); auto randdouble = std::bind(std::normal_distribution<double>(0.0,1.0),std::ref(generator)); // make a complex Pipeline std::unique_ptr<Pipeline> pipe(new Scale(randdouble())); for(unsigned i=0;i<100;++i){ if(randint()) pipe=std::move(std::unique_ptr<Pipeline>(new Scale(std::move(pipe),randdouble()))); else pipe=std::move(std::unique_ptr<Pipeline>(new Offset(std::move(pipe),randdouble()))); } // make a std::function from pipe Pipeline::FnSig fun(pipe->to_function()); double bla=0.0; for(unsigned i=0; i<100000; ++i){ #ifdef USE_FUNCTION // takes 110 ms on average bla+=fun(bla); #else // takes 60 ms on average bla+=pipe->operator()(bla); #endif } std::cout << bla << std::endl; }
Using pipe
:
g++ -std=gnu++11 sscce.cpp -march=native -O3 sudo nice -3 /usr/bin/time ./a.out -> 60 ms
Using fun
:
g++ -DUSE_FUNCTION -std=gnu++11 sscce.cpp -march=native -O3 sudo nice -3 /usr/bin/time ./a.out -> 110 ms
If it is small, like 3-5 CPU instructions then yes std::function will make it slower, because std::function is not inlined into outer calling code. You should use only lambda and pass lambda as template parameter to other functions, lambdas are inlined into calling code.
The virtual loop took 92ms longer than the direct loop, so the additional overhead per call was 7 nanoseconds per function. From this I conclude: yes, virtual functions are much slower than direct functions, and no, unless you're planning on calling them ten million times per second, it doesn't matter.
Note that when talking about overhead of std::function (i.e. that std::function is "heavy"), usually the performance overhead is meant. Its memory overhead will be minor (around 2 extra pointers, I'd guess).
Class template std::function is a general-purpose polymorphic function wrapper. Instances of std::function can store, copy, and invoke any Callable target -- functions, lambda expressions, bind expressions, or other function objects, as well as pointers to member functions and pointers to data members.
You have std::function
s binding lambdas that call std::function
s that bind lamdbas that call std::function
s that ...
Look at your to_function
. It creates a lambda that calls two std::function
s, and returns that lambda bound into another std::function
. The compiler won't resolve any of these statically.
So in the end, you end with with just as many indirect calls as the virtual function solution, and that's if you get rid of the bound processfun
and directly call it in the lambda. Otherwise you have twice as many.
If you want a speedup, you will have to create the entire pipeline in a way that can be statically resolved, and that means a lot more templates before you can finally erase the type into a single std::function
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With