Smart pointer wrapping penalty. Memoization with std::map

Question

I am currently in the middle of a project where performance is of vital importance. Following are some of the questions I had regarding this issue.

Question1

My project involves plenty of boost::shared_ptr.I know creating shared pointers on the run using boost::make_shared is slow since there is a lot of overhead as it needs to track references.I wanted to know what if the boost shared pointers are already created then would these two statements have the same performance or would one be faster than the other. If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to ?

 statement1: sharedptr->someMethod();  //here the pointer is a shared ptr created by boost::make_shared
 statement2: regularptr->someMethod(); //here the pointer is a regular one made with new

Question 2

I have an instance method(s) that is rapidly called that creates a std::vector<std::string> on the stack every time. I decided to store that vector pointer in a static std::map (i.e) std::map<std::String,std::vector<std::string>*>. If a vector does not exist in the map for the key(which could be the name of the method). The valid vector address is created and added to the map.So my question is "is it worth searching a map for a vector address and returning back a valid address over just creating one on the stack like std::vector<std::string> somevector. I would also like an idea on the performance of std::map find.

Any ideas regarding these concern would be appreciated.

Evgeny Panasyuk · Accepted Answer

Answer to Q#1

If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to?

operator-> within boost::shared_ptr has assertion:

typename boost::detail::sp_member_access< T >::type operator-> () const 
{
    BOOST_ASSERT( px != 0 );
    return px;
}

So, first of all, be sure that you have NDEBUG defined (usually in release builds it is done automatically):

#define NDEBUG

I have made assembler comparison between dereferencing of boost::shared_ptr and raw pointer:

template<int tag,typename T>
NOINLINE void test(const T &p)
{
    volatile auto anti_opti=0;
    ASM_MARKER<tag+0>();
    anti_opti = p->data;
    anti_opti = p->data;
    ASM_MARKER<tag+1>();
    (void)anti_opti;
}

test<1000>(new Foo);

ASM code of test when T is Foo* is (don't be scared, I have diff below):

_Z4testILi1000EP3FooEvRKT0_:
.LFB4088:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi1000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi1001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc

test<2000>(boost::make_shared<Foo>());

ASM code of test when T is boost::shared_ptr<Foo>:

_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
.LFB4090:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi2000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi2001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc

Here is output of diff -U 0 foo_p.asm shared_ptr_foo_p.asm command:

--- foo_p.asm   Fri Apr 12 10:38:05 2013
+++ shared_ptr_foo_p.asm        Fri Apr 12 10:37:52 2013
@@ -1,2 +1,2 @@
-_Z4testILi1000EP3FooEvRKT0_:
-.LFB4088:
+_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
+.LFB4090:
@@ -11 +11 @@
-call _Z10ASM_MARKERILi1000EEvv
+call _Z10ASM_MARKERILi2000EEvv
@@ -16 +16 @@
-call _Z10ASM_MARKERILi1001EEvv
+call _Z10ASM_MARKERILi2001EEvv

As you can see, difference is only in function signature, and tag non-type template argument value, rest of code is IDENTICAL.

In general - shared_ptr is very costly - it's reference counting is syncronized between threads (usually via atomic operations). If you would use boost::intrusive_ptr instead, then you can implement your own increment/decrement without thread-synchronization, which would speed-up reference counting.

If you can afford using unique_ptr or move semantic (via Boost.Move or C++11) - then there will be no any reference counting - it would be faster even more.

LIVE DEMO WITH ASM OUTPUT

#define NDEBUG

#include <boost/make_shared.hpp>
#include <boost/shared_ptr.hpp>

#define NOINLINE __attribute__ ((noinline))

template<int>
NOINLINE void ASM_MARKER()
{
    volatile auto anti_opti = 11;
    (void)anti_opti;
}

struct Foo
{
    int data;
};

template<int tag,typename T>
NOINLINE void test(const T &p)
{
    volatile auto anti_opti=0;
    ASM_MARKER<tag+0>();
    anti_opti = p->data;
    anti_opti = p->data;
    ASM_MARKER<tag+1>();
    (void)anti_opti;
}

int main()
{
    {
        auto p = new Foo;
        test<1000>(p);
        delete p;
    }
    {
        test<2000>(boost::make_shared<Foo>());
    }
}

Answer to Q#2

I have an instance method(s) that is rapidly called that creates a std::vector on the stack every time.

Generally, it is good idea to try to reuse vector's capacity in order to prevent costly re-allocations. For instance it is better to replace:

{
    for(/*...*/)
    {
        std::vector<value> temp;
        // do work on temp
    }
}

with:

{
    std::vector<value> temp;
    for(/*...*/)
    {
        // do work on temp
        temp.clear();
    }
}

But looks like due to type std::map<std::string,std::vector<std::string>*> you are trying to perfom some kind of memoization.

As already suggested, instead of std::map which has O(ln(N)) lookup/insert you may try to use boost::unordered_map/std::unordered_map which has O(1) average and O(N) worst case complexity for lookup/insert, and better locality/compactess (cache-friendly).

Also, cosider to try Boost.Flyweight:

Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight, which acts as a drop-in replacement for const T.

shivakumar · Answer

For Question1:

Major performance gain can be achived at an architecture design, algorithm used and while low level concerns are also important only when highlevel design is strong. Lets come to your question, Regular pointer performance is higher than shared_ptr. But the amount of overhead you see not using shared_ptr is also more which increases cost of maintaining code in longer run. Redundant object creation and destruction must be avoided in performance-critical applications. In such cases shared_ptr plays an important role which plays in sharing common objects accross threads by reducing overhead of releasing the resources. yes shared pointer consumes more time than regular pointers because of refcount,allocation(object,counter,deleter) etc. you can make shared_ptr faster by preventing unnecessary copy of them.use it as ref(shared_ptr const&). Moreover of you don't need shared resources accross threads don't use shared_ptr and regular ptr will give better performances in those case.

Question 2

If want to use reuse pool of shared_ptr objects you can better look into object pool design pattern approach. http://en.wikipedia.org/wiki/Object_pool_pattern

Smart pointer wrapping penalty. Memoization with std::map

Tags:

c++

performance

smart-pointers

shared-ptr

boost

Rajeshwar

2 Answers

Answer to Q#1

Answer to Q#2

Evgeny Panasyuk

shivakumar

Recent Activity

Donate For Us

Smart pointer wrapping penalty. Memoization with std::map

Tags:

c++

performance

smart-pointers

shared-ptr

boost

Rajeshwar

2 Answers

Answer to Q#1

Answer to Q#2

Evgeny Panasyuk

shivakumar

Related questions

Recent Activity

Donate For Us