While comparing assembly for std::shared_ptr
vs. boost::shared_ptr
, I noticed that GCC generates a whole lot more code for
void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; }
(https://godbolt.org/z/efTW6MoEh – more than 70 lines of assembler) than for the boost version, on which GCC's implementation of shared_ptr is based:
void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; }
(https://godbolt.org/z/3aoGq1f9P – around 30 lines of assembler).
In particular, I'm puzzled by the following instruction in the std::shared_ptr
version, a mention of which I can't (readily) find in the sources.
movq __gthrw___pthread_key_create(unsigned int*, void (*)(void*))@GOTPCREL(%rip), %rbx
Can someone shed some light as to why std::shared_ptr
generates so much more code than boost::shared_ptr
? Am I missing some magic command line option?
The C language comes up with the concept of threads and utilizes the POSIX thread library to do multi-threading, create multiple threads, and use them simultaneously. One of these threads is “pthread_create” which creates a new thread or function to perform some specific task.
The system lacked the necessary resources to create another thread-specific data key, or the system-imposed limit on the total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded. Insufficient memory exists to create the key. The pthread_key_create () function shall not return an error code of [EINTR].
The system lacked the necessary resources to create another thread-specific data key, or the system-imposed limit on the total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded. Insufficient memory exists to create the key.
In the obsolete LinuxThreads implementation, each of the threads in a process has a different process ID. This is in violation of the POSIX threads specification, and is the source of many other nonconformances to the standard; see pthreads (7) .
I think this is because GCC's libstdc++ is checking whether the program is actually multithreaded. If it's not, then it can skip the expensive locked instructions to atomically modify the reference counter, and revert to ordinary unlocked instructions. Boost doesn't have this feature and uses the locked instructions unconditionally.
For instance, in the libstdc++ code, you'll notice that if the pointer __gthrw___pthread_key_create
is null, we increment and decrement the reference counter at [rbp+8]
with simple non-atomic instructions (lines 12 and 16-18 of the assembly). But if it's not then we branch to a section where locked add/xadd
are done (lines 52-58).
I haven't really dug into the source code, but I suspect these details are buried in the references to _Lock_policy
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With