I'm a newcomer to C++ and I ran into a problem recently returning a reference to a local variable. I solved it by changing the return value from std::string&
to an std::string
. However, to my understanding this can be very inefficient. Consider the following code:
string hello()
{
string result = "hello";
return result;
}
int main()
{
string greeting = hello();
}
To my understanding, what happens is:
hello()
is called.result
is assigned a value of "hello"
.result
is copied into the variable greeting
.This probably doesn't matter that much for std::string
, but it can definitely get expensive if you have, for example, a hash table with hundreds of entries.
How do you avoid copy-constructing a returned temporary, and instead return a copy of the pointer to the object (essentially, a copy of the local variable)?
Sidenote: I've heard that the compiler will sometimes perform return-value optimization to avoid calling the copy constructor, but I think it's best not to rely on compiler optimizations to make your code run efficiently.)
The copy constructor is called because you call by value not by reference. Therefore a new object must be instantiated from your current object since all members of the object should have the same value in the returned instance.
commercial-grade C++ compilers won't do that: the return statement will directly construct x itself. Not a copy of x, not a pointer to x, not a reference to x, but x itself.
The copy constructor is used to − Initialize one object from another of the same type. Copy an object to pass it as an argument to a function. Copy an object to return it from a function.
A copy constructor can also be defined by a user; in this case, the default copy constructor is not called.
The description in your question is pretty much correct. But it is important to understand that this is behavior of the abstract C++ machine. In fact, the canonical description of abstract return behavior is even less optimal
result
is copied into a nameless intermediate temporary object of type std::string
. That temporary persists after the function's return.greeting
after function returns.Most compilers have always been smart enough to eliminate that intermediate temporary in full accordance with the classic copy elision rules. But even without that intermediate temporary the behavior has always been seen as grossly suboptimal. Which is why a lot of freedom was given to compilers in order to provide them with optimization opportunities in return-by-value contexts. Originally it was Return Value Optimization (RVO). Later Named Return Value Optimization was added to it (NRVO). And finally, in C++11, move semantics became an additional way to optimize the return behavior in such cases.
Note that under NRVO in your example the initialization of result
with "hello"
actually places that "hello"
directly into greeting
from the very beginning.
So in modern C++ the best advice is: leave it as is and don't avoid it. Return it by value. (And prefer to use immediate initialization at the point of declaration whenever you can, instead of opting for default initialization followed by assignment.)
Firstly, the compiler's RVO/NRVO capabilities can (and will) eliminate the copying. In any self-respecting compiler RVO/NRVO is not something obscure or secondary. It is something compiler writers do actively strive to implement and implement properly.
Secondly, there's always move semantics as a fallback solution if RVO/NRVO somehow fails or is not applicable. Moving is naturally applicable in return-by-value contexts and it is much less expensive than full-blown copying for non-trivial objects. And std::string
is a movable type.
I disagree with the sentence "I think it's best not to rely on compiler optimizations to make your code run efficiently." That's basically the compiler's whole job. Your job is to write clear, correct, and maintainable source code. For every performance issue I've ever had to fix, I've had to fix a hundred or more issues caused by a developer trying to be clever instead of doing something simple, correct, and maintainable.
Let's take a look at some of the things you could do to try to "help" the compiler and see how they affect the maintainability of the source code.
For example:
void hello(std::string& outString)
Returning data using a reference makes the code at the call-site hard to read. It's nearly impossible to tell what function calls mutate state as a side effect and which don't. Even if you're really careful with const qualifying the references it's going to be hard to read at the call site. Consider the following example:
void hello(std::string& outString); //<-This one could modify outString
void out(const std::string& toWrite); //<-This one definitely doesn't.
. . .
std::string myString;
hello(myString); //<-This one maybe mutates myString - hard to tell.
out(myString); //<-This one certainly doesn't, but it looks identical to the one above
Even the declaration of hello isn't clear. Does it modify outString, or was the author just sloppy and forgot to const qualify the reference? Code that is written in a functional style is easier to read and understand and harder to accidentally break.
Avoid returning the data via reference
Returning a pointer to the object makes it hard to be sure your code is even correct. Unless you use a unique_ptr you have to trust that anybody using your method is thorough and makes sure to delete the pointer when they're done with it, but that isn't very RAII. std::string is already a type of RAII wrapper for a char* that abstracts away the data lifetime issues associated with returning a pointer. Returning a pointer to a std::string just re-introduces the problems that std::string was designed to solve. Relying on a human being to be diligent and carefully read the documentation for your function and know when to delete the pointer and when not to delete the pointer is unlikely to have a positive outcome.
Avoid returning a pointer to the object instead of returning the object
A move constructor will just transfer ownership of the pointed-to data from 'result' to its final destination. Afterwards, accessing the 'result' object is invalid but that doesn't matter - your method ended and the 'result' object went out of scope. No copy, just a transfer of ownership of the pointer with clear semantics.
Normally the compiler will call the move constructor for you. If you're really paranoid (or have specific knowledge that the compiler isn't going to help you) you can use std::move.
Use move constructors if at all possible
Finally modern compilers are amazing. With a modern C++ compiler, 99% of the time the compiler is going to do some sort of optimization to eliminate the copy. The other 1% of the time it's probably not going to matter for performance. In specific circumstances the compiler can re-write a method like std::string GetString(); to void GetString(std::string& outVar); automatically. The code is still easy to read, but in the final assembly you get all of the real or imagined speed benefits of returning by reference. Don't sacrifice readability and maintainability for performance unless you have specific knowledge that the solution doesn't meet your business requirements.
There are plenty of ways to achieve that:
1) Return some data by the reference
void SomeFunc(std::string& sResult)
{
sResult = "Hello world!";
}
2) Return pointer to the object
CSomeHugeClass* SomeFunc()
{
CSomeHugeClass* pPtr = new CSomeHugeClass();
//...
return(pPtr);
}
3) C++ 11 could utilize a move constructor in such cases. See this this and this for the additional info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With