Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are compilers clever enough to std::move variables going out of scope?

Consider the following piece of code:

std::vector<int> Foo() {
    std::vector<int> v = Bar();
    return v;
}

return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:

void Foo(std::vector<int> * to_be_filled) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
}

A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?

Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:

void Foo(std::vector<int> * to_be_filled) {
  if (Baz()) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
  }
  ...
}

Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?


Edit:

g++ 7.3.0 does not perform any such optimizations in -O3 mode.

like image 336
André Harder Avatar asked Jun 16 '18 00:06

André Harder


People also ask

What happen when a variable of reference data type in goes out of scope?

Nothing physical happens. A typical implementation will allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function.

Is std :: move necessary?

A: You should use std::move if you want to call functions that support move semantics with an argument which is not an rvalue (temporary expression).

Should I std :: move return value?

When returning a named local variable or a temporary expression directly, you should avoid the explicit std::move . The compiler must (and will in the future) move automatically in those cases, and adding std::move might affect other optimizations.

Does std :: move make a copy?

std::move is actually just a request to move and if the type of the object has not a move constructor/assign-operator defined or generated the move operation will fall back to a copy.


1 Answers

The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).

*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.

My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.

That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.

Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.

Is there a subtle reason why not?

One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.

That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.

Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.

By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.

It's a lot harder to do that in return <identifier>; cases.

like image 86
Nicol Bolas Avatar answered Oct 13 '22 01:10

Nicol Bolas