Background
As an organizational strategy, I like to define function-local lambdas in complicated functions. It's good for encapsulating multi-step logic, repeated operations, etc. (the sorts of things that functions are good for in general), but without creating something that'll be visible outside of the scope where it's used. It's kind of a synthesis of/alternative to the styles John Carmack lays out in his essay on the merits of inlining code in that it keeps everything neatly bottled up in the function it's intended to be used in while also giving a (compiler-recognized) name to document each block of functionality. A simple, contrived example might look like this (just pretend there was actually something complex enough going on here to merit using this sort of style):
void printSomeNumbers(void)
{
const auto printNumber = [](auto number) {
std::cout << number << std::endl; // Non-trivial logic (maybe formatting) would go here
};
printNumber(1);
printNumber(2.0);
}
Semantically speaking, the compiled form of this function is 'supposed' to create an instance of an implicitly-defined functor, then call operator()()
on that functor for each of the provided inputs, since that's what it means to use a lambda in C++. In optimized builds, though, the as-if rule frees the compiler up to inline some stuff, meaning that the actual generated code is probably going to just inline the contents of the lambda and skip defining/instantiating the functor entirely. Discussions of this sort of inlining have come up in past discussions here and here, among other places.
Question
In all of the lambda inlining questions and answers I've found, the presented examples haven't made use of any form of lambda capture, and they also largely pertain to passing a lambda as a parameter to something (i.e. inlining a lambda in the context of an std::for_each
call). My question, then, is this: can a compiler still inline a lambda which captures values? More specifically (since I'd assume that the lifetimes of the various variables involved factors into the answer quite a bit), can the compiler reasonably inline a lambda which is only used inside of the function where it's defined, even if it captures some things (i.e. local variables) by reference?
My intuition here would be that inlining should be possible, since the compiler has full visibility into all of the code and the relevant variables (including their lifetimes relative to the lambda), but I'm not positive and my assembly-reading skills aren't up to snuff enough to get a reliable answer for myself.
Additional Example
Just in case the specific use-case I'm describing isn't quite clear, here's a modified version of the lambda above which makes use of the kind of pattern I'm describing (again, please ignore the fact that the code is contrived and needlessly over-complicated):
void printSomeNumbers(void)
{
std::ostringstream ss;
const auto appendNumber = [&ss](auto number) {
ss << number << std::endl; // Pretend this is something non-trivial
};
appendNumber(1);
appendNumber(2.0);
std::cout << ss.str();
}
I'd expect that an optimizing compiler should have enough information to completely inline all lambda usages and not generate (or at least not keep) any functors here, even though it's making use of a captured-by-reference variable that 'should' be treated as a member of some auto-generated closure type.
Yes.
Modern compilers use "static single assignment" (SSA) as an optimization pass.
Each time you assign to a value or modify it, a conceptually different value is created. Sometimes these conceptually different values share identity (for the purpose of pointers-to).
Identity, when you take the address of something, is the thing that gets in the way of this.
Simple references are turned into aliases for the value they reference; they have no identity. This is part of the original design intent for references, and why you cannot have a pointer to a reference.
Concretely:
std::string printSomeNumbers(void)
{
std::ostringstream ss;
const auto appendNumber = [&ss](auto number) {
ss << number << "\n"; // Pretend this is something non-trivial
};
printf("hello\n");
appendNumber(1);
printf("world\n");
appendNumber(2.0);
printf("today\n");
return ss.str();
}
compiles to:
printSomeNumbers[abi:cxx11](): # @printSomeNumbers[abi:cxx11]()
push r14
push rbx
sub rsp, 376
mov r14, rdi
mov rbx, rsp
mov rdi, rbx
mov esi, 16
call std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode)
mov edi, offset .Lstr
call puts
mov rdi, rbx
mov esi, 1
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov esi, offset .L.str.3
mov edx, 1
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edi, offset .Lstr.8
call puts
mov rdi, rsp
movsd xmm0, qword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
mov esi, offset .L.str.3
mov edx, 1
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edi, offset .Lstr.9
call puts
lea rsi, [rsp + 8]
mov rdi, r14
call std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const
mov rax, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >]
mov qword ptr [rsp], rax
mov rcx, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >+24]
mov rax, qword ptr [rax - 24]
mov qword ptr [rsp + rax], rcx
mov qword ptr [rsp + 8], offset vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >+16
mov rdi, qword ptr [rsp + 80]
lea rax, [rsp + 96]
cmp rdi, rax
je .LBB0_7
call operator delete(void*)
.LBB0_7:
mov qword ptr [rsp + 8], offset vtable for std::basic_streambuf<char, std::char_traits<char> >+16
lea rdi, [rsp + 64]
call std::locale::~locale() [complete object destructor]
lea rdi, [rsp + 112]
call std::ios_base::~ios_base() [base object destructor]
mov rax, r14
add rsp, 376
pop rbx
pop r14
ret
Godbolt
Notice that between the printf calls (in the assembly they are puts
) there is no call other than directly to a operator<<
of ostringstream
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With