While comparing assembly for <code>std::shared_ptr</code> vs. <code>boost::shared_ptr</code>, I noticed that GCC generates a whole lot more code for <pre class="prettyprint"><code>void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; } </code></pre> (https://godbolt.org/z/efTW6MoEh – more than 70 lines of assembler) than for the boost version, on which GCC's implementation of shared_ptr is based: <pre class="prettyprint"><code>void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; } </code></pre> (https://godbolt.org/z/3aoGq1f9P – around 30 lines of assembler). In particular, I'm puzzled by the following instruction in the <code>std::shared_ptr</code> version, a mention of which I can't (readily) find in the sources. <pre class="prettyprint"><code>movq __gthrw___pthread_key_create(unsigned int*, void (*)(void*))@GOTPCREL(%rip), %rbx </code></pre> Can someone shed some light as to why <code>std::shared_ptr</code> generates so much more code than <code>boost::shared_ptr</code>? Am I missing some magic command line option?

I think this is because GCC's libstdc++ is checking whether the program is actually multithreaded. If it's not, then it can skip the expensive locked instructions to atomically modify the reference counter, and revert to ordinary unlocked instructions. Boost doesn't have this feature and uses the locked instructions unconditionally. For instance, in the libstdc++ code, you'll notice that if the pointer <code>__gthrw___pthread_key_create</code> is null, we increment and decrement the reference counter at <code>[rbp+8]</code> with simple non-atomic instructions (lines 12 and 16-18 of the assembly). But if it's not then we branch to a section where locked <code>add/xadd</code> are done (lines 52-58). I haven't really dug into the source code, but I suspect these details are buried in the references to <code>_Lock_policy</code>.

GCC codegen: What does pthread_create_key() have to do with std::shared_ptr copying?

Tags:

c++

assembly

shared-ptr

x86-64

gnu

While comparing assembly for std::shared_ptr vs. boost::shared_ptr, I noticed that GCC generates a whole lot more code for

void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; }

(https://godbolt.org/z/efTW6MoEh – more than 70 lines of assembler) than for the boost version, on which GCC's implementation of shared_ptr is based:

void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; }

(https://godbolt.org/z/3aoGq1f9P – around 30 lines of assembler).

In particular, I'm puzzled by the following instruction in the std::shared_ptr version, a mention of which I can't (readily) find in the sources.

movq    __gthrw___pthread_key_create(unsigned int*, void (*)(void*))@GOTPCREL(%rip), %rbx

Can someone shed some light as to why std::shared_ptr generates so much more code than boost::shared_ptr? Am I missing some magic command line option?

844

asked Jul 10 '21 16:07

Marc Mutz - mmutz

1 Answers

I think this is because GCC's libstdc++ is checking whether the program is actually multithreaded. If it's not, then it can skip the expensive locked instructions to atomically modify the reference counter, and revert to ordinary unlocked instructions. Boost doesn't have this feature and uses the locked instructions unconditionally.

For instance, in the libstdc++ code, you'll notice that if the pointer __gthrw___pthread_key_create is null, we increment and decrement the reference counter at [rbp+8] with simple non-atomic instructions (lines 12 and 16-18 of the assembly). But if it's not then we branch to a section where locked add/xadd are done (lines 52-58).

I haven't really dug into the source code, but I suspect these details are buried in the references to _Lock_policy.

answered Oct 21 '22 21:10

Nate Eldredge

Related questions
                            
                                How to enforce single threaded build in source code
                            
                                g++ and clang++ different behaviour with friend template function defined inside a template class
                            
                                Compound expression in if statement
                            
                                How to ignore QTapGesture after QTapAndHoldGesture
                            
                                How to use processor instructions in C++ to implement fast arithmetic operations
                            
                                Can a reinterpret_cast change the object representation?
                            
                                Why template with only valid empty variadic pack ill formed?
                            
                                Is there any real argument for getters/setters instead of public member variables in a simple Point class?
                            
                                Is there an authoritative way to guard against "use after move" mistakes in c++?
                            
                                Template class + delegating constructor = fields not initialized? (clang-tidy)
                            
                                Filtering string using regex in utf8 format
                            
                                How to use own copy of static library in each shared library
                            
                                Replacing printf("%g", value) with a stream manipulation
                            
                                C++ function pointer argument with template
                            
                                Is there any benefit in using std::forward instead of std::move to initialize an object?
                            
                                Why does only one of these CRTP patterns compile?
                            
                                How to efficiently move (some) items from one std::map to another?
                            
                                Forward declaration to break cyclic dependency in C++20 modules doesn't work
                            
                                Why is dereferencing of nullptr while using a static method not undefined behaviour in C++?
                            
                                Is there a way to use concepts to disable member functions that would produce a reference to void?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With