Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Which is faster on Visual C++ 2010 - std::shared_ptr or boost::shared_ptr?

Has anyone tested this in release mode builds? Or are the implementations so similar there's no significant difference?

I'm interested in the speed to:

  1. Create a new shared_ptr

  2. Create a copy of the shared_ptr

  3. De-reference the pointer to access the pointee

This would be in a release build optimized for speed with new shared_ptrs being created with make_shared()

like image 846
mpipe3 Avatar asked Jun 24 '11 16:06


2 Answers

VS10's version uses rvalue references and move semantics when possible, so in principle it has the upper hand over the Boost C++98 implementation. You'd probably have to work fairly hard to create a program that would show a significant practical difference, though... but do give it a try. Also don't forget about std::make_shared, which is new in C++0x thanks to forwarding.

Update: Dereferencing and copying are going to be practically identical in any case. Perhaps there are some interesting differences in the way custom deleters and allocators are stored, and in how make_shared is implemented. Let me check the source.

Update 2: Funnily enough, the Boost version that uses variadic templates and rvalue references definitely looks better than the VS10 version, since VS10 doesn't have variadic templates and has to employ horrible black arts to fake that behaviour. But that's entirely a compile-time issue so it's not relevant.

like image 33
Kerrek SB Avatar answered Oct 10 '22 03:10

Kerrek SB

Ok, so it doesn't look like anyone has done this. Here's what I found using the standard VC 10 optimized settings for a WIN32 console app:

  1. Visual C++ 2010 SP1 std::make_shared and std::shared_ptr were faster than the Boost 1.46.1 equivalents when populating a vector of 10 million pointer entries ( 1.96 secs versus 0.92 secs averaged across 20 runs)

  2. Boost 1.46.1 was slightly faster than Visual C++ 2010 SP1 when copying an array of 10 million pointer entries ( 0.15 secs versus 0.17 secs averaged over 20 runs)

  3. Visual C++ 2010 SP1 was slightly faster than the Boost 1.46.1 equivalents when dereferencing a vector of 10 million pointer entries 20 times ( 0.72 secs versus 0.811 secs averaged over 20 runs)

CONCLUSION: There was a significant difference when creating shared_ptrs to populate a vector. The Visual C++ 2010 shared_ptr was nearly twice as fast indicating a substantial difference in implementation compared to Boost 1.46.1.

The other tests didn't show a significant difference.

Here's the code I used:

#include "stdafx.h"

struct A
    A( const unsigned A) : m_value(A)

    const unsigned m_value;

typedef std::shared_ptr<A> APtr;
typedef boost::shared_ptr<A> ABoostPtr;

double TestSTLCreateSpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<APtr> buffer;

    boost::timer timer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( std::make_shared<A>(nEntry) );

    const double timeTaken = timer.elapsed();

    std::cout << "STL create test took " << timeTaken << " secs.\r\n";
    return timeTaken;

double BoostSTLCreateSpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<ABoostPtr> buffer;

    boost::timer timer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( boost::make_shared<A>(nEntry) );

    const double timeTaken = timer.elapsed();

    std::cout << "BOOST create test took " << timeTaken << " secs.\r\n";
    return timeTaken;

double TestSTLCopySpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<APtr> buffer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( std::make_shared<A>(nEntry) );

    boost::timer timer;
    std::vector<APtr> buffer2 = buffer;

    const double timeTaken = timer.elapsed();

    std::cout << "STL copy test took " << timeTaken << " secs.\r\n";
    return timeTaken;

double TestBoostCopySpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<ABoostPtr> buffer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( boost::make_shared<A>(nEntry) );

    boost::timer timer;
    std::vector<ABoostPtr> buffer2 = buffer;

    const double timeTaken = timer.elapsed();

    std::cout << "BOOST copy test took " << timeTaken << " secs.\r\n";
    return timeTaken;

double TestBoostDerefSpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<ABoostPtr> buffer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( boost::make_shared<A>(nEntry) );

    boost::timer timer;

    unsigned total = 0;

    for(unsigned nIter = 0; nIter < 20; ++nIter)
        std::for_each( buffer.begin(), buffer.end(),
            [&](const ABoostPtr& pA){ 
                total += pA->m_value;

    const double timeTaken = timer.elapsed();

    std::cout << "BOOST deref total =  " << total << ".\r\n";

    std::cout << "BOOST deref test took " << timeTaken << " secs.\r\n";
    return timeTaken;

double TestSTLDerefSpeed()
    const unsigned NUM_ENTRIES = 10000000;
    std::vector<APtr> buffer;

    for( unsigned nEntry = 0; nEntry < NUM_ENTRIES; ++nEntry)
        buffer.emplace_back( std::make_shared<A>(nEntry) );

    boost::timer timer;

    unsigned total = 0;
    for(unsigned nIter = 0; nIter < 20; ++nIter)
        std::for_each( buffer.begin(), buffer.end(),
            [&](const APtr& pA){ 
                total += pA->m_value;

    const double timeTaken = timer.elapsed();

    std::cout << "STL deref total =  " << total << ".\r\n";

    std::cout << "STL deref test took " << timeTaken << " secs.\r\n";
    return timeTaken;

int _tmain(int argc, _TCHAR* argv[])
    double totalTime = 0.0;
    const unsigned NUM_TESTS = 20;

    totalTime = 0.0;

    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += BoostSTLCreateSpeed();

    std::cout << "BOOST create test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    totalTime = 0.0;
    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += TestSTLCreateSpeed();

    std::cout << "STL create test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    totalTime = 0.0;
    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += TestBoostCopySpeed();

    std::cout << "BOOST copy test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    totalTime = 0.0;
    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += TestSTLCopySpeed();

    std::cout << "STL copy test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    totalTime = 0.0;
    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += TestBoostDerefSpeed();

    std::cout << "Boost deref test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    totalTime = 0.0;
    for ( unsigned nTest = 0; nTest < NUM_TESTS; ++nTest)
        totalTime += TestSTLDerefSpeed();

    std::cout << "STL deref test took " << totalTime / NUM_TESTS << " secs average.\r\n";

    return 0;

I'll wait a while and if no one has refuted my results or come up with some better conclusions I'll accept my own answer.

like image 176
mpipe3 Avatar answered Oct 10 '22 02:10
