Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Why would I std::move an std::shared_ptr?

People also ask

Why is std :: move necessary?

std::move itself does "nothing" - it has zero side effects. It just signals to the compiler that the programmer doesn't care what happens to that object any more. i.e. it gives permission to other parts of the software to move from the object, but it doesn't require that it be moved.

Why would you choose shared_ptr instead of Unique_ptr?

In short: Use unique_ptr when you want a single pointer to an object that will be reclaimed when that single pointer is destroyed. Use shared_ptr when you want multiple pointers to the same resource.

When should you use shared_ptr?

An object referenced by the contained raw pointer will not be destroyed until reference count is greater than zero i.e. until all copies of shared_ptr have been deleted. So, we should use shared_ptr when we want to assign one raw pointer to multiple owners. // referring to the same managed object.

Is copying shared_ptr thread safe?

When you copy a std::shared_ptr in a thread, all is fine. At first to (2). By using copy construction for the std::shared_ptr localPtr, only the control block is used. That is thread-safe.

I think that the one thing the other answers did not emphasize enough is the point of speed.

std::shared_ptr reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.

By moving the shared_ptr instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr (and causing atomic reference increment or decrement).

Do note that this technique is used purely for optimization. copying it (as you suggested) is just as fine functionality-wise.

By using move you avoid increasing, and then immediately decreasing, the number of shares. That might save you some expensive atomic operations on the use count.

Move operations (like move constructor) for std::shared_ptr are cheap, as they basically are "stealing pointers" (from source to destination; to be more precise, the whole state control block is "stolen" from source to destination, including the reference count information).

Instead copy operations on std::shared_ptr invoke atomic reference count increase (i.e. not just ++RefCount on an integer RefCount data member, but e.g. calling InterlockedIncrement on Windows), which is more expensive than just stealing pointers/state.

So, analyzing the ref count dynamics of this case in details:

// shared_ptr<CompilerInvocation> sp;

If you pass sp by value and then take a copy inside the CompilerInstance::setInvocation method, you have:

  1. When entering the method, the shared_ptr parameter is copy constructed: ref count atomic increment.
  2. Inside the method's body, you copy the shared_ptr parameter into the data member: ref count atomic increment.
  3. When exiting the method, the shared_ptr parameter is destructed: ref count atomic decrement.

You have two atomic increments and one atomic decrement, for a total of three atomic operations.

Instead, if you pass the shared_ptr parameter by value and then std::move inside the method (as properly done in Clang's code), you have:

  1. When entering the method, the shared_ptr parameter is copy constructed: ref count atomic increment.
  2. Inside the method's body, you std::move the shared_ptr parameter into the data member: ref count does not change! You are just stealing pointers/state: no expensive atomic ref count operations are involved.
  3. When exiting the method, the shared_ptr parameter is destructed; but since you moved in step 2, there's nothing to destruct, as the shared_ptr parameter is not pointing to anything anymore. Again, no atomic decrement happens in this case.

Bottom line: in this case you get just one ref count atomic increment, i.e. just one atomic operation.
As you can see, this is much better than two atomic increments plus one atomic decrement (for a total of three atomic operations) for the copy case.

There are two reasons for using std::move in this situation. Most responses addressed the issue of speed, but ignored the important issue of showing the code's intent more clearly.

For a std::shared_ptr, std::move unambiguously denotes a transfer of ownership of the pointee, while a simple copy operation adds an additional owner. Of course, if the original owner subsequently relinquishes their ownership (such as by allowing their std::shared_ptr to be destroyed), then a transfer of ownership has been accomplished.

When you transfer ownership with std::move, it's obvious what is happening. If you use a normal copy, it isn't obvious that the intended operation is a transfer until you verify that the original owner immediately relinquishes ownership. As a bonus, a more efficient implementation is possible, since an atomic transfer of ownership can avoid the temporary state where the number of owners has increased by one (and the attendant changes in reference counts).

Copying a shared_ptr involves copying its internal state object pointer and changing the reference count. Moving it only involves swapping pointers to the internal reference counter, and the owned object, so it's faster.

Since none of these answers offered an actual benchmark, I thought I'd try to provide one. However, think I've left myself more confused than when I started. I tried to come up with a test that would measure passing a shared_ptr<int> by value, by reference, and using std::move, performing an add operation on that value, and returning the result. I did this several times (one million) using two sets of tests. The first set added a constant value to the shared_ptr<int>, the other added a random value in the [0, 10] range. I figured the constant value addition would be a candidate for heavy optimization, whereas the random value test would not. That is more-or-less what I saw, but the extreme differences in execution time leads me to believe that other factors/problems with this test program are the contributing factors to the execution time differences, not the move semantics.


For no optimizations (-O0), constant addition

  • std::move was ~4x faster than pass-by-value
  • std::move was marginally slower than pass-by-reference

For high optimizations (-O3), constant addition

  • std::move was 70-90 thousand times faster than pass-by-value
  • std::move was marginally faster than pass-by-reference (anywhere from 1-1.4 times)

For no optimizations (-O0), random addition

  • std::move was 1-2 times faster than pass-by-value
  • std::move was marginally slower than pass-by-reference

For high optimizations (-O3), random addition

  • std::move was 1-1.3 times faster than pass-by-value (marginally worse than no optimizations)
  • std::move was essentially the same as pass-by-reference

Finally, the test

#include <memory>
#include <iostream>
#include <chrono>
#include <ctime>
#include <random>

constexpr auto MAX_NUM_ITS = 1000000;

// using random values to try to cut down on massive compiler optimizations
static std::random_device RAND_DEV;
static std::mt19937 RNG(RAND_DEV());
static std::uniform_int_distribution<std::mt19937::result_type> DIST11(0,10);

void CopyPtr(std::shared_ptr<int> myInt)
    // demonstrates that use_count increases with each copy
    std::cout << "In CopyPtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myCopyInt(myInt);
    std::cout << "In CopyPtr: ref count = " << myCopyInt.use_count() << std::endl;

void ReferencePtr(std::shared_ptr<int>& myInt)
    // reference count stays the same until a copy is made
    std::cout << "In ReferencePtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myCopyInt(myInt);
    std::cout << "In ReferencePtr: ref count = " << myCopyInt.use_count() << std::endl;

void MovePtr(std::shared_ptr<int>&& myInt)
    // demonstrates that use_count remains constant with each move
    std::cout << "In MovePtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myMovedInt(std::move(myInt));
    std::cout << "In MovePtr: ref count = " << myMovedInt.use_count() << std::endl;

int CopyPtrFastConst(std::shared_ptr<int> myInt)
    return 5 + *myInt;

int ReferencePtrFastConst(std::shared_ptr<int>& myInt)
    return 5 + *myInt;

int MovePtrFastConst(std::shared_ptr<int>&& myInt)
    return 5 + *myInt;

int CopyPtrFastRand(std::shared_ptr<int> myInt)
    return DIST11(RNG) + *myInt;

int ReferencePtrFastRand(std::shared_ptr<int>& myInt)
    return DIST11(RNG) + *myInt;

int MovePtrFastRand(std::shared_ptr<int>&& myInt)
    return DIST11(RNG) + *myInt;

void RunConstantFunctions(std::shared_ptr<int> myInt)
    std::cout << "\nIn constant funcs, ref count = " << myInt.use_count() << std::endl;
    // demonstrates speed of each function
    int sum = 0;

    // Copy pointer
    auto start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += CopyPtrFastConst(myInt);
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> copyElapsed = end - start;
    std::cout << "CopyPtrConst sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";

    // pass pointer by reference
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += ReferencePtrFastConst(myInt);
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> refElapsed = end - start;
    std::cout << "ReferencePtrConst sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";

    // pass pointer using std::move
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += MovePtrFastConst(std::move(myInt));
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> moveElapsed = end - start;
    std::cout << "MovePtrConst sum = " << sum << ", took " << moveElapsed.count() <<
        " seconds.\n";

    std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
    std::cout << "std::move vs pass by ref:   " << refElapsed / moveElapsed << " times faster.\n";

void RunRandomFunctions(std::shared_ptr<int> myInt)
    std::cout << "\nIn random funcs, ref count = " << myInt.use_count() << std::endl;
    // demonstrates speed of each function
    int sum = 0;

    // Copy pointer
    auto start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += CopyPtrFastRand(myInt);
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> copyElapsed = end - start;
    std::cout << "CopyPtrRand sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";

    // pass pointer by reference
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += ReferencePtrFastRand(myInt);
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> refElapsed = end - start;
    std::cout << "ReferencePtrRand sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";

    // pass pointer using std::move
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
        sum += MovePtrFastRand(std::move(myInt));
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> moveElapsed = end - start;
    std::cout << "MovePtrRand sum = " << sum << ", took " << moveElapsed.count() <<
        " seconds.\n";

    std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
    std::cout << "std::move vs pass by ref:   " << refElapsed / moveElapsed << " times faster.\n";

int main()
    // demonstrates how use counts are effected between copy and move
    std::shared_ptr<int> myInt = std::make_shared<int>(5);
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;

    // since myInt was moved to MovePtr and fell out of scope on return (was destroyed),
    // we have to reinitialize myInt
    myInt = std::make_shared<int>(5);


    return 0;

live version here

I noticed that for -O0 and -O3, the constant functions both compiled to the same assembly for both sets of flags, both relatively short blocks. This makes me think a majority of the optimization comes from the calling code, but I'm not really seeing that in my amateur assembly knowledge.

The random functions compiled to quite a bit of assembly, even for -O3, so the random part must be dominating that routine.

So in the end, not really sure what to make of this. Please throw darts at it, tell me what I did wrong, offer some explanations.