std::move and RVO optimizations

Question

I've recently read how std::move can speed up code by just moving the values instead of copying them. So I made a test program to compare the speed using std::vector.

The code:

#include <iostream>
#include <vector>
#include <stdint.h>

#ifdef WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif
#undef max

// Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
// windows and linux.

uint64_t GetTimeMs64()
{
#ifdef _WIN32
    // Windows
    FILETIME ft;
    LARGE_INTEGER li;

    // Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
    // to a LARGE_INTEGER structure.
    GetSystemTimeAsFileTime(&ft);
    li.LowPart = ft.dwLowDateTime;
    li.HighPart = ft.dwHighDateTime;

    uint64_t ret = li.QuadPart;
    ret -= 116444736000000000LL; // Convert from file time to UNIX epoch time.
    ret /= 10000; // From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals

    return ret;
#else
    // Linux
    struct timeval tv;

    gettimeofday(&tv, NULL);

    uint64 ret = tv.tv_usec;
    // Convert from micro seconds (10^-6) to milliseconds (10^-3)
    ret /= 1000;

    // Adds the seconds (10^0) after converting them to milliseconds (10^-3)
    ret += (tv.tv_sec * 1000);

    return ret;
#endif
}

static std::vector<std::string> GetVec1()
{
    std::vector<std::string> o(100000, "abcd");
    bool tr = true;
    if (tr)
        return std::move(o);
    return std::move(std::vector<std::string>(100000, "abcd"));
}

static std::vector<std::string> GetVec2()
{
    std::vector<std::string> o(100000, "abcd");
    bool tr = true;
    if (tr)
        return o;
    return std::vector<std::string>(100000, "abcd");
}

int main()
{
    uint64_t timer;
    std::vector<std::string> vec;

    timer = GetTimeMs64();
    for (int i = 0; i < 1000; ++i)
        vec = GetVec1();
    std::cout << GetTimeMs64() - timer << " timer 1(std::move)" << std::endl;
    timer = GetTimeMs64();
    for (int i = 0; i < 1000; ++i)
        vec = GetVec2();
    std::cout << GetTimeMs64() - timer << " timer 2(no move)" << std::endl;
    std::cin.get();
    return 0;
}

I got the following results:

Release (x86) /O2. tr = true

4376 timer 1(std::move)

4191 timer 2(no move)

Release (x86) /O2. tr = false

7311 timer 1(std::move)

7301 timer 2(no move)

The results between the 2 timers are really close and don't really differ that much. I already assumed this is because of Return value optimization (RVO) which means that my returns by value are already moved by the compiler without me knowing, right?

So then I ran new tests without any optimizations to make sure I was right. The results:

Release (x86) /Od. tr = true

40860 timer 1(std::move)

40863 timer 2(no move)

Release (x86) /Od. tr = false

83567 timer 1(std::move)

82075 timer 2(no move)

Now even though the difference between /O2 and /Od is really significant, the difference between no move or std::move (and even between tr being true or false) is minimal.

Does this mean that even though optimizations are disabled, the compiler is allowed to apply RVO or is std::move not as fast as I thought I'd be?

Angew is no longer proud of SO · Accepted Answer

There's a fundamental piece of info you're missing: the standard specifically enforces that when a return statement (and a few other, less common contexts) specifies a function-local variable (such as o in your case), overload resolution to construct the return value from the argument is first performed as if the argument was an rvalue (even though it's not). Only when this fails is overload resolution done again, with the lvalue. This is covered by C++14 12.8/32; similar wording exists in C++11.

12.8/32 When the criteria for elision of a copy/move operation are met, but not for an exception-declaration, and the object to be copied is designated by an lvalue, or when the expression in a return statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [ Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note ] ...

(Emphasis mine)

So in effect, there's an unvaoidable, implicit std::move present in every return statement when returning a function-scope automatic variable.

Using std::move in a return statement is, if anything, a pessimisation. It prevents NRVO, and does not get you anything, due to the "implicitly try rvalue first" rule.

std::move and RVO optimizations

Tags:

c++

optimization

c++11

return-value-optimization

visual-studio-2015

Hatted Rooster

1 Answers

Angew is no longer proud of SO

Recent Activity

Donate For Us

std::move and RVO optimizations

Tags:

c++

optimization

c++11

return-value-optimization

visual-studio-2015

Hatted Rooster

1 Answers

Angew is no longer proud of SO

Related questions

Recent Activity

Donate For Us