Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How effectively can function-local lambdas be inlined by C++ compilers?

Background

As an organizational strategy, I like to define function-local lambdas in complicated functions. It's good for encapsulating multi-step logic, repeated operations, etc. (the sorts of things that functions are good for in general), but without creating something that'll be visible outside of the scope where it's used. It's kind of a synthesis of/alternative to the styles John Carmack lays out in his essay on the merits of inlining code in that it keeps everything neatly bottled up in the function it's intended to be used in while also giving a (compiler-recognized) name to document each block of functionality. A simple, contrived example might look like this (just pretend there was actually something complex enough going on here to merit using this sort of style):

void printSomeNumbers(void)
{
  const auto printNumber = [](auto number) {
    std::cout << number << std::endl; // Non-trivial logic (maybe formatting) would go here
  };

  printNumber(1);
  printNumber(2.0);
}

Semantically speaking, the compiled form of this function is 'supposed' to create an instance of an implicitly-defined functor, then call operator()() on that functor for each of the provided inputs, since that's what it means to use a lambda in C++. In optimized builds, though, the as-if rule frees the compiler up to inline some stuff, meaning that the actual generated code is probably going to just inline the contents of the lambda and skip defining/instantiating the functor entirely. Discussions of this sort of inlining have come up in past discussions here and here, among other places.

Question

In all of the lambda inlining questions and answers I've found, the presented examples haven't made use of any form of lambda capture, and they also largely pertain to passing a lambda as a parameter to something (i.e. inlining a lambda in the context of an std::for_each call). My question, then, is this: can a compiler still inline a lambda which captures values? More specifically (since I'd assume that the lifetimes of the various variables involved factors into the answer quite a bit), can the compiler reasonably inline a lambda which is only used inside of the function where it's defined, even if it captures some things (i.e. local variables) by reference?

My intuition here would be that inlining should be possible, since the compiler has full visibility into all of the code and the relevant variables (including their lifetimes relative to the lambda), but I'm not positive and my assembly-reading skills aren't up to snuff enough to get a reliable answer for myself.

Additional Example

Just in case the specific use-case I'm describing isn't quite clear, here's a modified version of the lambda above which makes use of the kind of pattern I'm describing (again, please ignore the fact that the code is contrived and needlessly over-complicated):

void printSomeNumbers(void)
{
  std::ostringstream ss;
  const auto appendNumber = [&ss](auto number) {
    ss << number << std::endl; // Pretend this is something non-trivial
  };

  appendNumber(1);
  appendNumber(2.0);

  std::cout << ss.str();
}

I'd expect that an optimizing compiler should have enough information to completely inline all lambda usages and not generate (or at least not keep) any functors here, even though it's making use of a captured-by-reference variable that 'should' be treated as a member of some auto-generated closure type.

like image 458
bionicOnion Avatar asked Mar 28 '19 19:03

bionicOnion


1 Answers

Yes.

Modern compilers use "static single assignment" (SSA) as an optimization pass.

Each time you assign to a value or modify it, a conceptually different value is created. Sometimes these conceptually different values share identity (for the purpose of pointers-to).

Identity, when you take the address of something, is the thing that gets in the way of this.

Simple references are turned into aliases for the value they reference; they have no identity. This is part of the original design intent for references, and why you cannot have a pointer to a reference.

Concretely:

std::string printSomeNumbers(void)
{
  std::ostringstream ss;
  const auto appendNumber = [&ss](auto number) {
    ss << number << "\n"; // Pretend this is something non-trivial
  };

  printf("hello\n");
  appendNumber(1);
  printf("world\n");
  appendNumber(2.0);
  printf("today\n");

  return ss.str();
}

compiles to:

printSomeNumbers[abi:cxx11]():           # @printSomeNumbers[abi:cxx11]()
        push    r14
        push    rbx
        sub     rsp, 376
        mov     r14, rdi
        mov     rbx, rsp
        mov     rdi, rbx
        mov     esi, 16
        call    std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode)
        mov     edi, offset .Lstr
        call    puts
        mov     rdi, rbx
        mov     esi, 1
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        mov     esi, offset .L.str.3
        mov     edx, 1
        mov     rdi, rax
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        mov     edi, offset .Lstr.8
        call    puts
        mov     rdi, rsp
        movsd   xmm0, qword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero
        call    std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
        mov     esi, offset .L.str.3
        mov     edx, 1
        mov     rdi, rax
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        mov     edi, offset .Lstr.9
        call    puts
        lea     rsi, [rsp + 8]
        mov     rdi, r14
        call    std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const
        mov     rax, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >]
        mov     qword ptr [rsp], rax
        mov     rcx, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >+24]
        mov     rax, qword ptr [rax - 24]
        mov     qword ptr [rsp + rax], rcx
        mov     qword ptr [rsp + 8], offset vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >+16
        mov     rdi, qword ptr [rsp + 80]
        lea     rax, [rsp + 96]
        cmp     rdi, rax
        je      .LBB0_7
        call    operator delete(void*)
.LBB0_7:
        mov     qword ptr [rsp + 8], offset vtable for std::basic_streambuf<char, std::char_traits<char> >+16
        lea     rdi, [rsp + 64]
        call    std::locale::~locale() [complete object destructor]
        lea     rdi, [rsp + 112]
        call    std::ios_base::~ios_base() [base object destructor]
        mov     rax, r14
        add     rsp, 376
        pop     rbx
        pop     r14
        ret

Godbolt

Notice that between the printf calls (in the assembly they are puts) there is no call other than directly to a operator<< of ostringstream.

like image 112
Yakk - Adam Nevraumont Avatar answered Nov 10 '22 14:11

Yakk - Adam Nevraumont