Firstly please take a look at the following code, which consists of 2 translation units.
--- foo.h ---
class Foo
{
public:
Foo();
Foo(const Foo& rhs);
void print() const;
private:
std::string str_;
};
Foo getFoo();
--- foo.cpp ---
#include <iostream>
Foo::Foo() : str_("hello")
{
std::cout << "Default Ctor" << std::endl;
}
Foo::Foo(const Foo& rhs) : str_(rhs.str_)
{
std::cout << "Copy Ctor" << std::endl;
}
void Foo:print() const
{
std::cout << "print [" << str_ << "]" << std:endl;
}
Foo getFoo()
{
return Foo(); // Expecting RVO
}
--- main.cpp ---
#include "foo.h"
int main()
{
Foo foo = getFoo();
foo.print();
}
Please be sure that foo.cpp and main.cpp are different translation units. So as per my understanding, we can say that there is no implementation details of getFoo() available in the translation unit main.o (main.cpp).
However, if we compile and execute the above, I could not see the "Copy Ctor" string which indicates that RVO works here.
It would be really appreciated if anyone of you kindly let me know how this can be achieved even if the implementation details of 'getFoo()' is not exposed to the translation unit main.o?
I conducted the above experiment by using GCC (g++) 4.4.6.
The compiler simply has to work consistently.
In other words, the compiler has to look solely at a return type, and based on that type, decide how a function returning an object of that type will return the value.
At least in a typical case, that decision is fairly trivial. It sets aside a register (or possibly two) to use for return values (e.g., on an Intel/AMD x86/x64 that'll normally be EAX or RAX). Any type small enough to fit into that will be returned there. For any type too large to fit there, the function will receive a hidden pointer/reference parameter that tells it where to deposit the return result. Note that this much applies without RVO/NRVO being involved at all -- in fact, it applies equally to C code that returns a struct
as it does to C++ returning a class
object. Although returning a struct
probably isn't quite as common in C as in C++, it's still allowed, and the compiler has to be able to compile code that does it.
There are really two separate (possible) copies that can be eliminated. One is that the compiler may allocate space on the stack for a local holding what will be the return value, then copy from there to where the pointer refers during the return.
The second is a possible copy from that return address into some other location where the value really needs to end up.
The first gets eliminated inside the function itself, but has no effect on its external interface. It ultimately puts the data wherever the hidden pointer tells it to -- the only question is whether it creates a local copy first, or always works directly with the return point. Obviously with [N]RVO, it always works directly.
The second possible copy is from that (potential) temporary into wherever the value really needs to end up. This is eliminated by optimizing the calling sequence, not the function itself -- i.e., giving the function a pointer to the final destination for that return value, rather than to some temporary location, from which the compiler will then copy the value into its destination.
main
doesn't need the implementation details of getFoo
for RVO to occur. It simply expects the return value to be in some register after getFoo
exits.
getFoo
has two options for this - create an object in its scope and then copy (or move it) to the return register, or create the object directly in that register. Which is what happens.
It's not telling main to look anywhere else, nor does it need to. It just uses the return register directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With