Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are by-value parameters excluded from NRVO?

Imagine:

S f(S a) {
  return a;
}

Why is it not allowed to alias a and the return value slot?

S s = f(t);
S s = t; // can't generally transform it to this :(

The spec doesn't allow this transformation if the copy constructor of S has side effects. Instead, it requires at least two copies (one from t to a, and one from a to the return value, and another from the return value to s, and only that last one can be elided. Note that I wrote = t above to represent the fact of a copy of t to f's a, the only copy which would still be mandatory in the presence of side effects of move/copy constructor).

Why is that?

like image 710
Johannes Schaub - litb Avatar asked May 15 '11 15:05

Johannes Schaub - litb


3 Answers

Here's why copy elision doesn't make sense for parameters. It's really about the implementation of the concept at the compiler level.

Copy elision works by essentially constructing the return value in-place. The value isn't copied out; it's created directly in its intended destination. It's the caller who provides the space for the intended output, and thus it's ultimately the caller who provides the possibility for the elision.

All that the function internally needs to do in order to elide the copy is construct the output in the place provided by the caller. If the function can do this, you get copy elision. If the function can't, then it will use one or more temporary variables to store the intermediate results, then copy/move this into the place provided by the caller. It's still constructed in-place, but the construction of the output happens via copy.

So the world outside of a particular function doesn't have to know or care about whether a function does elision. Specifically, the caller of the function doesn't have to know about how the function is implemented. It's not doing anything different; it's the function itself that decides if elision is possible.

Storage for value parameters is also provided by the caller. When you call f(t), it is the caller that creates the copy of t and passes it to f. Similarly, if S is implicitly constructable from an int, then f(5) will construct an S from the 5 and pass it to f.

This is all done by the caller. The callee doesn't know or care that it was a variable or a temporary; it's just given a spot of stack memory (or registers or whatever).

Now remember: copy elision works because the function being called constructs the variable directly into the output location. So if you're trying to elide the return from a value parameter, then the storage for the value parameter must also be the output storage itself. But remember: it is the caller that provides that storage for both the parameter and the output. And therefore, to elide the output copy, the caller must construct the parameter directly into the output.

To do this, now the caller needs to know that the function it's calling will elide the return value, because it can only stick the parameter directly into the output if the parameter will be returned. That's not going to generally be possible at the compiler level, because the caller doesn't necessarily have the implementation of the function. If the function is inlined, then maybe it can work. But otherwise no.

Therefore, the C++ committee didn't bother to allow for the possibility.

like image 83
Nicol Bolas Avatar answered Nov 04 '22 08:11

Nicol Bolas


The rationale, as I understand it, for that restriction is that the calling convention might (and will in many cases) demand that the argument to the function and the return object are at different locations (either memory or registers). Consider the following modified example:

X foo();
X bar( X a ) 
{ 
   return a;
}
int main() {
   X x = bar( foo() );
}

In theory the whole set of copies would be return statement in foo ($tmp1), argument a of bar, return statement of bar ($tmp2) and x in main. Compilers can elide two of the four objects by creating $tmp1 at the location of a and $tmp2 at the location of x. When the compiler is processing main it can note that the return value of foo is the argument to bar and can make them coincide, at that point it cannot possibly know (without inlining) that the argument and return of bar are the same object, and it has to comply with the calling convention, so it will place $tmp1 in the position of the argument to bar.

At the same time, it knows that the purpose of $tmp2 is only creating x, so it can place both at the same address. Inside bar, there is not much that can be done: the argument a is located in place of the first argument, according to the calling convention, and $tmp2 has to be located according to the calling convention, (in the general case in a different location, think that the example can be extended to a bar that takes more arguments, only one of which is used as return statement.

Now, if the compiler performs inlining it could detect that the extra copy that would be required if the function was not inlined is really not needed, and it would have a chance for eliding it. If the standard would allow for that particular copy to be elided, then the same code would have different behaviors depending on whether the function is inlined or not.

like image 3
David Rodríguez - dribeas Avatar answered Nov 04 '22 10:11

David Rodríguez - dribeas


David Rodríguez - dribeas answer to my question 'How to allow copy elision construction for C++ classes' gave me the following idea. The trick is to use lambdas to delay evaluation til inside the function body:

#include <iostream>

struct S
{
  S() {}
  S(const S&) { std::cout << "Copy" << std::endl; }
  S(S&&) { std::cout << "Move" << std::endl; }
};

S f1(S a) {
  return a;
}

S f2(const S& a) {
  return a;
}

#define DELAY(x) [&]{ return x; }

template <class F>
S f3(const F& a) {
  return a();
}

int main()
{
  S t;
  std::cout << "Without delay:" << std::endl;
  S s1 = f1(t);
  std::cout << "With delay:" << std::endl;
  S s2 = f3(DELAY(t));
  std::cout << "Without delay pass by ref:" << std::endl;
  S s3 = f2(t);
  std::cout << "Without delay pass by ref (temporary) (should have 0 copies, will get 1):" << std::endl;
  S s4 = f2(S());
  std::cout << "With delay (temporary) (no copies, best):" << std::endl;
  S s5 = f3(DELAY(S()));
}

This outputs on ideone GCC 4.5.1:

Without delay:
Copy
Copy
With delay:
Copy

Now this is good, but one could suggest that the DELAY version is just like passing by const reference, as below:

Without delay pass by ref:
Copy

But if we pass a temporary by const reference, we still get a copy:

Without delay pass by ref (temporary) (should have 0 copies, will get 1):
Copy

Where the delayed version elides the copy:

With delay (temporary) (no copies, best):

As you can see, this elides all copies in the temporary case.

The delayed version produces one copy in the non-temporary case, and no copies in the case of a temporary. I don't know any way to achieve this other than lambdas, but I'd be interested if there is.

like image 2
Clinton Avatar answered Nov 04 '22 09:11

Clinton