Does the C++ standard force capture-by-reference of local variables to be inefficient? [duplicate]

Tags:

I recently needed a lambda that captured multiple local variables by reference, so I made a test snippet to investigate its efficiency, and compiled it with -O3 using clang 3.6:

void do_something_with(void*);

void test()
{
    int a = 0, b = 0, c = 0;

    auto func = [&] () {
        a++;
        b++;
        c++;
    };

    do_something_with((void*)&func);
}

movl   $0x0,0x24(%rsp)
movl   $0x0,0x20(%rsp)
movl   $0x0,0x1c(%rsp)

lea    0x24(%rsp),%rax
mov    %rax,(%rsp)
lea    0x20(%rsp),%rax
mov    %rax,0x8(%rsp)
lea    0x1c(%rsp),%rax
mov    %rax,0x10(%rsp)

lea    (%rsp),%rdi
callq  ...

Clearly the lambda only needs the address of one of the variables, from which all the others could be obtained by relative addressing.

Instead, the compiler created a struct on the stack containing pointers to each local variable, and then passed the address of the struct to the lambda. It's much in the same way as if I had written:

int a = 0, b = 0, c = 0;

struct X
{
    int *pa, *pb, *pc;
};

X x = {&a, &b, &c};

auto func = [p = &x] () {
    (*p->pa)++;
    (*p->pb)++;
    (*p->pc)++;
};

This is inefficient for various reasons, but most worryingly because it could lead to heap-allocation if too many variables are captured.

My questions:

The fact that both clang and gcc do this at -O3 makes me suspect that something in the standard actually forces closures to be implemented inefficiently. Is this the case?
If so, then for what reasoning? It cannot be for binary compatibility of lambdas between compilers, because any code that knows about the type of the lambda is guaranteed to lie in the same translation unit.
If not, then why is this optimisation missing from two major compilers?

EDIT:
Here is an example of the more efficient code that I would like to have seen from the compiler. This code uses less stack space, the lambda now only performs one pointer indirection instead of two, and the lambda's size does not grow in the number of captured variables:

struct X
{
    int a = 0, b = 0, c = 0;
} x;

auto func = [&x] () {
    x.a++;
    x.b++;
    x.c++;
};

movl   $0x0,0x8(%rsp)
movl   $0x0,0xc(%rsp)
movl   $0x0,0x10(%rsp)

lea    0x8(%rsp),%rax
mov    %rax,(%rsp)

lea    (%rsp),%rdi
callq  ...

783

asked Jul 09 '15 13:07

PBS

1 Answers

It looks like unspecified behavior. The following paragraph from the C++14 draft standard: N3936 section 5.1.2 Lambda Expressions [expr.prim.lambda] makes me think this:

An entity is captured by reference if it is implicitly or explicitly captured but not captured by copy. It is unspecified whether additional unnamed non-static data members are declared in the closure type for entities captured by reference. [...]

which different for entities captured by copy:

Every id-expression within the compound-statement of a lambda-expression that is an odr-use (3.2) of an entity captured by copy is transformed into an access to the corresponding unnamed data member of the closure type.

Thanks to dyp for pointing out some relevant documents which I somehow missed. It looks like defect report 750: Implementation constraints on reference-only closure objects provides the rationale for the current wording, and it says:

Consider an example like:
void f(vector<double> vec) {
  double x, y, z;
  fancy_algorithm(vec, [&]() { /* use x, y, and z in various ways */ });
}
5.1.2 [expr.prim.lambda] paragraph 8 requires that the closure class for this lambda will have three reference members, and paragraph 12 requires that it be derived from std::reference_closure, implying two additional pointer members. Although 8.3.2 [dcl.ref] paragraph 4 allows a reference to be implemented without allocation of storage, current ABIs require that references be implemented as pointers. The practical effect of these requirements is that the closure object for this lambda expression will contain five pointers. If not for these requirements, however, it would be possible to implement the closure object as a single pointer to the stack frame, generating data accesses in the function-call operator as offsets relative to the frame pointer. The current specification is too tightly constrained.

which echos your exact points about allowing potential optimizations and was implemented as part of N2927 which includes the following:

The new wording no longer specifies any rewrite or closure members for "by reference" capture. Uses of entities captured "by reference" affect the original entities, and the mechanism to achieve this is left entirely to the implementation.

answered Oct 18 '22 15:10

Shafik Yaghmour

Related questions
                            
                                C/C++ Linux, How to find neighbors on a network Without Using IP (MAC Only)
                            
                                C++ "group where" algorithm
                            
                                c++, opencv: Is it safe to use the same Mat for both source and destination images in filtering operation?
                            
                                C++ tellg() return type
                            
                                Conditional operator + upcast + const reference
                            
                                How to use basic_ostringstream with stateful custom allocator?? (C++11)
                            
                                Is the static initialization of global variables completed before `main()`?
                            
                                Default move constructor in a sub class
                            
                                Is there a template debugger?
                            
                                Destruction order of statically initialized, non-literal objects
                            
                                How to specify compiler flag to a single source file with qmake?
                            
                                How to fix unresolved externals of SDL 2.0.3 on Visual Studio 2015 Preview?
                            
                                What is meaning of locality of data structure?
                            
                                How to build Qt with custom compiler and custom flags?
                            
                                Can a class hierarchy be safe and trivially copyable?
                            
                                Limitations of Common Subexpression Elimination in C++
                            
                                Linker error with variable templates
                            
                                Initialize member before base constructor. Possible? [duplicate]
                            
                                In C++11, how can I implement an arithmetic type that fits into the hierarchy of builtin types?
                            
                                decltype() of captured variable in lambda: GCC bug and/or Clang bug?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does the C++ standard force capture-by-reference of local variables to be inefficient? [duplicate]

Tags:

c++

compiler-optimization

language-lawyer

lambda

PBS

People also ask

1 Answers

Shafik Yaghmour

Recent Activity

Donate For Us