I am currently in the middle of a project where performance is of vital importance. Following are some of the questions I had regarding this issue.
Question1
My project involves plenty of boost::shared_ptr
.I know creating shared pointers on the run using boost::make_shared
is slow since there is a lot of overhead as it needs to track references.I wanted to know what if the boost shared pointers are already created then would these two statements have the same performance or would one be faster than the other. If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to ?
statement1: sharedptr->someMethod(); //here the pointer is a shared ptr created by boost::make_shared
statement2: regularptr->someMethod(); //here the pointer is a regular one made with new
Question 2
I have an instance method(s) that is rapidly called that creates a std::vector<std::string>
on the stack every time. I decided to store that vector pointer in a static std::map (i.e) std::map<std::String,std::vector<std::string>*>
. If a vector does not exist in the map for the key(which could be the name of the method). The valid vector address is created and added to the map.So my question is "is it worth searching a map for a vector address and returning back a valid address over just creating one on the stack like std::vector<std::string> somevector
. I would also like an idea on the performance of std::map
find.
Any ideas regarding these concern would be appreciated.
If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to?
operator->
within boost::shared_ptr
has assertion:
typename boost::detail::sp_member_access< T >::type operator-> () const
{
BOOST_ASSERT( px != 0 );
return px;
}
So, first of all, be sure that you have NDEBUG
defined (usually in release builds it is done automatically):
#define NDEBUG
I have made assembler comparison between dereferencing of boost::shared_ptr
and raw pointer:
template<int tag,typename T>
NOINLINE void test(const T &p)
{
volatile auto anti_opti=0;
ASM_MARKER<tag+0>();
anti_opti = p->data;
anti_opti = p->data;
ASM_MARKER<tag+1>();
(void)anti_opti;
}
test<1000>(new Foo);
ASM
code of test
when T
is Foo*
is (don't be scared, I have diff
below):
_Z4testILi1000EP3FooEvRKT0_:
.LFB4088:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi1000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi1001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
test<2000>(boost::make_shared<Foo>());
ASM
code of test
when T
is boost::shared_ptr<Foo>
:
_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
.LFB4090:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi2000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi2001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
Here is output of diff -U 0 foo_p.asm shared_ptr_foo_p.asm
command:
--- foo_p.asm Fri Apr 12 10:38:05 2013
+++ shared_ptr_foo_p.asm Fri Apr 12 10:37:52 2013
@@ -1,2 +1,2 @@
-_Z4testILi1000EP3FooEvRKT0_:
-.LFB4088:
+_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
+.LFB4090:
@@ -11 +11 @@
-call _Z10ASM_MARKERILi1000EEvv
+call _Z10ASM_MARKERILi2000EEvv
@@ -16 +16 @@
-call _Z10ASM_MARKERILi1001EEvv
+call _Z10ASM_MARKERILi2001EEvv
As you can see, difference is only in function signature, and tag
non-type template argument value, rest of code is IDENTICAL
.
In general - shared_ptr
is very costly - it's reference counting is syncronized between threads (usually via atomic operations). If you would use boost::intrusive_ptr
instead, then you can implement your own increment
/decrement
without thread-synchronization, which would speed-up reference counting.
If you can afford using unique_ptr
or move semantic (via Boost.Move or C++11) - then there will be no any reference counting - it would be faster even more.
LIVE DEMO WITH ASM OUTPUT
#define NDEBUG
#include <boost/make_shared.hpp>
#include <boost/shared_ptr.hpp>
#define NOINLINE __attribute__ ((noinline))
template<int>
NOINLINE void ASM_MARKER()
{
volatile auto anti_opti = 11;
(void)anti_opti;
}
struct Foo
{
int data;
};
template<int tag,typename T>
NOINLINE void test(const T &p)
{
volatile auto anti_opti=0;
ASM_MARKER<tag+0>();
anti_opti = p->data;
anti_opti = p->data;
ASM_MARKER<tag+1>();
(void)anti_opti;
}
int main()
{
{
auto p = new Foo;
test<1000>(p);
delete p;
}
{
test<2000>(boost::make_shared<Foo>());
}
}
I have an instance method(s) that is rapidly called that creates a std::vector on the stack every time.
Generally, it is good idea to try to reuse vector
's capacity in order to prevent costly re-allocations. For instance it is better to replace:
{
for(/*...*/)
{
std::vector<value> temp;
// do work on temp
}
}
with:
{
std::vector<value> temp;
for(/*...*/)
{
// do work on temp
temp.clear();
}
}
But looks like due to type std::map<std::string,std::vector<std::string>*>
you are trying to perfom some kind of memoization.
As already suggested, instead of std::map
which has O(ln(N)) lookup/insert you may try to use boost::unordered_map
/std::unordered_map
which has O(1) average and O(N) worst case complexity for lookup/insert, and better locality/compactess (cache-friendly).
Also, cosider to try Boost.Flyweight:
Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight, which acts as a drop-in replacement for const T.
For Question1:
Major performance gain can be achived at an architecture design, algorithm used and while low level concerns are also important only when highlevel design is strong. Lets come to your question, Regular pointer performance is higher than shared_ptr. But the amount of overhead you see not using shared_ptr is also more which increases cost of maintaining code in longer run. Redundant object creation and destruction must be avoided in performance-critical applications. In such cases shared_ptr plays an important role which plays in sharing common objects accross threads by reducing overhead of releasing the resources. yes shared pointer consumes more time than regular pointers because of refcount,allocation(object,counter,deleter) etc. you can make shared_ptr faster by preventing unnecessary copy of them.use it as ref(shared_ptr const&). Moreover of you don't need shared resources accross threads don't use shared_ptr and regular ptr will give better performances in those case.
Question 2
If want to use reuse pool of shared_ptr objects you can better look into object pool design pattern approach. http://en.wikipedia.org/wiki/Object_pool_pattern
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With