C++11 std::function slower than virtual calls?

Tags:

I am creating a mechanism which allows users to form arbitrary complex functions from basic building blocks using the decorator pattern. This works fine functionality wise, but I don't like the fact that it involves a lot of virtual calls, particularly when the nesting depth becomes large. It worries me because the complex function may called often (>100.000 times).

To avoid this problem, I tried to turn the decorator scheme into a std::function once it was finished (cfr. to_function() in the SSCCE). All internal function calls are wired during construction of the std::function. I figured this would be faster to evaluate than the original decorator scheme because no virtual lookups need to be performed in the std::function version.

Alas, benchmarks prove me wrong: the decorator scheme is in fact faster than the std::function I built from it. So now I am left wondering why. Maybe my test setup is faulty since I only use two trivial basic functions, which means the vtable lookups may be cached?

The code I used is included below, unfortunately it is quite long.

SSCCE

// sscce.cpp #include <iostream> #include <vector> #include <memory> #include <functional> #include <random>  /**  * Base class for Pipeline scheme (implemented via decorators)  */ class Pipeline { protected:     std::unique_ptr<Pipeline> wrappee;     Pipeline(std::unique_ptr<Pipeline> wrap)     :wrappee(std::move(wrap)){}     Pipeline():wrappee(nullptr){}  public:     typedef std::function<double(double)> FnSig;     double operator()(double input) const{         if(wrappee.get()) input=wrappee->operator()(input);         return process(input);     }      virtual double process(double input) const=0;     virtual ~Pipeline(){}      // Returns a std::function which contains the entire Pipeline stack.     virtual FnSig to_function() const=0; };  /**  * CRTP for to_function().  */ template <class Derived> class Pipeline_CRTP : public Pipeline{ protected:     Pipeline_CRTP(const Pipeline_CRTP<Derived> &o):Pipeline(o){}     Pipeline_CRTP(std::unique_ptr<Pipeline> wrappee)     :Pipeline(std::move(wrappee)){}     Pipeline_CRTP():Pipeline(){}; public:     typedef typename Pipeline::FnSig FnSig;      FnSig to_function() const override{         if(Pipeline::wrappee.get()!=nullptr){              FnSig wrapfun = Pipeline::wrappee->to_function();             FnSig processfun = std::bind(&Derived::process,                 static_cast<const Derived*>(this),                 std::placeholders::_1);             FnSig fun = [=](double input){                 return processfun(wrapfun(input));             };             return std::move(fun);          }else{              FnSig processfun = std::bind(&Derived::process,                 static_cast<const Derived*>(this),                 std::placeholders::_1);             FnSig fun = [=](double input){                 return processfun(input);             };             return std::move(fun);         }      }      virtual ~Pipeline_CRTP(){} };  /**  * First concrete derived class: simple scaling.  */ class Scale: public Pipeline_CRTP<Scale>{ private:     double scale_; public:     Scale(std::unique_ptr<Pipeline> wrap, double scale) // todo move :Pipeline_CRTP<Scale>(std::move(wrap)),scale_(scale){}     Scale(double scale):Pipeline_CRTP<Scale>(),scale_(scale){}      double process(double input) const override{         return input*scale_;     } };  /**  * Second concrete derived class: offset.  */ class Offset: public Pipeline_CRTP<Offset>{ private:     double offset_; public:     Offset(std::unique_ptr<Pipeline> wrap, double offset) // todo move :Pipeline_CRTP<Offset>(std::move(wrap)),offset_(offset){}     Offset(double offset):Pipeline_CRTP<Offset>(),offset_(offset){}      double process(double input) const override{         return input+offset_;     } };  int main(){      // used to make a random function / arguments     // to prevent gcc from being overly clever     std::default_random_engine generator;     auto randint = std::bind(std::uniform_int_distribution<int>(0,1),std::ref(generator));     auto randdouble = std::bind(std::normal_distribution<double>(0.0,1.0),std::ref(generator));      // make a complex Pipeline     std::unique_ptr<Pipeline> pipe(new Scale(randdouble()));     for(unsigned i=0;i<100;++i){         if(randint()) pipe=std::move(std::unique_ptr<Pipeline>(new Scale(std::move(pipe),randdouble())));         else pipe=std::move(std::unique_ptr<Pipeline>(new Offset(std::move(pipe),randdouble())));     }      // make a std::function from pipe     Pipeline::FnSig fun(pipe->to_function());         double bla=0.0;     for(unsigned i=0; i<100000; ++i){ #ifdef USE_FUNCTION         // takes 110 ms on average         bla+=fun(bla); #else         // takes 60 ms on average         bla+=pipe->operator()(bla); #endif     }        std::cout << bla << std::endl; }

Benchmark

Using pipe:

g++ -std=gnu++11 sscce.cpp -march=native -O3 sudo nice -3 /usr/bin/time ./a.out -> 60 ms

Using fun:

g++ -DUSE_FUNCTION -std=gnu++11 sscce.cpp -march=native -O3 sudo nice -3 /usr/bin/time ./a.out -> 110 ms

261

asked Sep 04 '13 08:09

Marc Claesen

1 Answers

You have std::functions binding lambdas that call std::functions that bind lamdbas that call std::functions that ...

Look at your to_function. It creates a lambda that calls two std::functions, and returns that lambda bound into another std::function. The compiler won't resolve any of these statically.

So in the end, you end with with just as many indirect calls as the virtual function solution, and that's if you get rid of the bound processfun and directly call it in the lambda. Otherwise you have twice as many.

If you want a speedup, you will have to create the entire pipeline in a way that can be statically resolved, and that means a lot more templates before you can finally erase the type into a single std::function.

answered Oct 11 '22 11:10

Sebastian Redl

Related questions
                            
                                Is it safe to use the address of a static local variable within a function template as a type identifier?
                            
                                "multiset" & "multimap" - What's the point?
                            
                                Alias template specialisation
                            
                                C++ decimal data types
                            
                                Qt/QML : Send QImage From C++ to QML and Display The QImage On GUI
                            
                                "constexpr if" vs "if" with optimizations - why is "constexpr" needed?
                            
                                C++ standard: dereferencing NULL pointer to get a reference? [duplicate]
                            
                                Documenting enum values with doxygen
                            
                                What is the motivation behind static polymorphism in C++?
                            
                                Which sorting algorithm is used by STL's list::sort()?
                            
                                How to detect win32 process creation/termination in c++
                            
                                How-to ensure that compiler optimizations don't introduce a security risk?
                            
                                length of va_list when using variable list arguments?
                            
                                How do I erase elements from STL containers?
                            
                                LINK : fatal error LNK1561: entry point must be defined ERROR IN VC++
                            
                                Isn't the template argument (the signature) of std::function part of its type?
                            
                                "&" meaning after variable type [duplicate]
                            
                                Is std::vector<T> a `user-defined type`?
                            
                                What is the difference between MinGW SEH and MinGW SJLJ?
                            
                                STL Rope - when and where to use

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++11 std::function slower than virtual calls?

Tags:

c++

performance

c++11