I've recently read how std::move
can speed up code by just moving the values instead of copying them. So I made a test program to compare the speed using std::vector
.
The code:
#include <iostream>
#include <vector>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif
#undef max
// Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
// windows and linux.
uint64_t GetTimeMs64()
{
#ifdef _WIN32
// Windows
FILETIME ft;
LARGE_INTEGER li;
// Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
// to a LARGE_INTEGER structure.
GetSystemTimeAsFileTime(&ft);
li.LowPart = ft.dwLowDateTime;
li.HighPart = ft.dwHighDateTime;
uint64_t ret = li.QuadPart;
ret -= 116444736000000000LL; // Convert from file time to UNIX epoch time.
ret /= 10000; // From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals
return ret;
#else
// Linux
struct timeval tv;
gettimeofday(&tv, NULL);
uint64 ret = tv.tv_usec;
// Convert from micro seconds (10^-6) to milliseconds (10^-3)
ret /= 1000;
// Adds the seconds (10^0) after converting them to milliseconds (10^-3)
ret += (tv.tv_sec * 1000);
return ret;
#endif
}
static std::vector<std::string> GetVec1()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return std::move(o);
return std::move(std::vector<std::string>(100000, "abcd"));
}
static std::vector<std::string> GetVec2()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return o;
return std::vector<std::string>(100000, "abcd");
}
int main()
{
uint64_t timer;
std::vector<std::string> vec;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec1();
std::cout << GetTimeMs64() - timer << " timer 1(std::move)" << std::endl;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec2();
std::cout << GetTimeMs64() - timer << " timer 2(no move)" << std::endl;
std::cin.get();
return 0;
}
I got the following results:
Release (x86) /O2. tr = true
4376 timer 1(std::move)
4191 timer 2(no move)
Release (x86) /O2. tr = false
7311 timer 1(std::move)
7301 timer 2(no move)
The results between the 2 timers are really close and don't really differ that much. I already assumed this is because of Return value optimization (RVO) which means that my returns by value are already moved by the compiler without me knowing, right?
So then I ran new tests without any optimizations to make sure I was right. The results:
Release (x86) /Od. tr = true
40860 timer 1(std::move)
40863 timer 2(no move)
Release (x86) /Od. tr = false
83567 timer 1(std::move)
82075 timer 2(no move)
Now even though the difference between /O2 and /Od is really significant, the difference between no move or std::move
(and even between tr
being true
or false
) is minimal.
Does this mean that even though optimizations are disabled, the compiler is allowed to apply RVO
or is std::move
not as fast as I thought I'd be?
There's a fundamental piece of info you're missing: the standard specifically enforces that when a return
statement (and a few other, less common contexts) specifies a function-local variable (such as o
in your case), overload resolution to construct the return value from the argument is first performed as if the argument was an rvalue (even though it's not). Only when this fails is overload resolution done again, with the lvalue. This is covered by C++14 12.8/32; similar wording exists in C++11.
12.8/32 When the criteria for elision of a copy/move operation are met, but not for an exception-declaration, and the object to be copied is designated by an lvalue, or when the expression in a
return
statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [ Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note ] ...
(Emphasis mine)
So in effect, there's an unvaoidable, implicit std::move
present in every return
statement when returning a function-scope automatic variable.
Using std::move
in a return statement is, if anything, a pessimisation. It prevents NRVO, and does not get you anything, due to the "implicitly try rvalue first" rule.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With