Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding the overhead of lambda functions in C++11

This was already touched in Why C++ lambda is slower than ordinary function when called multiple times? and C++0x Lambda overhead But I think my example is a bit different from the discussion in the former and contradicts the result in the latter.

On the search for a bottleneck in my code I found a recusive template function that processes a variadic argument list with a given processor function, like copying the value into a buffer.

template <typename T> void ProcessArguments(std::function<void(const T &)> process) {}  template <typename T, typename HEAD, typename ... TAIL> void ProcessArguments(std::function<void(const T &)> process, const HEAD &head, const TAIL &... tail) {   process(head);   ProcessArguments(process, tail...); } 

I compared the runtime of a program that uses this code together with a lambda function as well as a global function that copies the arguments into a global buffer using a moving pointer:

int buffer[10]; int main(int argc, char **argv) {   int *p = buffer;    for (unsigned long int i = 0; i < 10E6; ++i)   {     p = buffer;     ProcessArguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);   } } 

compiled with g++ 4.6 and -O3 measuring with the tool time takes more than 6 seconds on my machine while

int buffer[10]; int *p = buffer; void CopyIntoBuffer(const int &value) {   *p++ = value; }  int main(int argc, char **argv) {   int *p = buffer;    for (unsigned long int i = 0; i < 10E6; ++i)   {     p = buffer;     ProcessArguments<int>(CopyIntoBuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);   }    return 0; } 

takes about 1.4 seconds.

I do not get what is going on behind the scenes that explains the time overhead and am wondering if I can change something to make use of lambda functions without paying with runtime.

like image 218
mcbulba Avatar asked Sep 04 '13 16:09

mcbulba


People also ask

What is lambda function in C++11?

In C++11 and later, a lambda expression—often called a lambda—is a convenient way of defining an anonymous function object (a closure) right at the location where it's invoked or passed as an argument to a function.

Are Lambda functions expensive in C++?

It's cheap. (Depending on your definition of cheap!) Memory wise: As a lambda will generate a class, it will be as expensive as creating an equivalent class that holds the same number of variables as you capture.

How does lambda capture work?

By default, variables are captured by const value . This means when the lambda is created, the lambda captures a constant copy of the outer scope variable, which means that the lambda is not allowed to modify them. In the following example, we capture the variable ammo and try to decrement it.

Where are lambdas stored C++?

Lambda expressions can capture of outer scope variables into a lambda function. The captured variables are stored in the closure object created for the lambda function.


1 Answers

The problem here is your usage of std::function. You send it by copy and therefore copying its contents (and doing that recursively as you unwind parameters).

Now, for pointer to function, contents is, well, just pointer to function. For lambda, contents are at least pointer to function + reference that you captured. This is twice as much to copy. Plus, because of std::function's type erasure copying any data will most likely be slower (not inlined).

There are several options here, and the best would probably be passing not std::function, but template instead. The benefits are that your method call is more likely to be inlined, no type erasure happens by std::function, no copying happens, everything is so very good. Like that:

template <typename TFunc> void ProcessArguments(const TFunc& process) {}  template <typename TFunc, typename HEAD, typename ... TAIL> void ProcessArguments(const TFunc& process, const HEAD &head, const TAIL &... tail) {   process(head);   ProcessArguments(process, tail...); } 

Second option is doing the same, but sending the process by copy. Now, copying does happen, but still is neatly inlined.

What's equally important is that process' body can also be inlined, especially for lamda. Depending on complexity of copying the lambda object and its size, passing by copy may or may not be faster than passing by reference. It may be faster because compiler may have harder time reasoning about reference than the local copy.

template <typename TFunc> void ProcessArguments(TFunc process) {}  template <typename TFunc, typename HEAD, typename ... TAIL> void ProcessArguments(TFunc process, const HEAD &head, const TAIL &... tail) {   process(head);   ProcessArguments(process, tail...); } 

Third option is, well, try passing std::function<> by reference. This way you at least avoid copying, but calls will not be inlined.

Here are some perf results (using ideones' C++11 compiler). Note that, as expected, inlined lambda body is giving you best performance:

Original function: 0.483035s  Original lambda: 1.94531s   Function via template copy: 0.094748  ### Lambda via template copy: 0.0264867s   Function via template reference: 0.0892594s  ### Lambda via template reference: 0.0264201s   Function via std::function reference: 0.0891776s  Lambda via std::function reference: 0.09s 
like image 132
Artem Tokmakov Avatar answered Oct 09 '22 11:10

Artem Tokmakov