In the following program I attempt the make the <code>print</code> function thread-safe by using a function-local mutex object: <pre class="prettyprint"><code>#include <iostream> #include <chrono> #include <mutex> #include <string> #include <thread> void print(const std::string & s) { // Thread safe? static std::mutex mtx; std::unique_lock<std::mutex> lock(mtx); std::cout <<s << std::endl; } int main() { std::thread([&](){ for (int i = 0; i < 10; ++i) print("a" + std::to_string(i)); }).detach(); std::thread([&](){ for (int i = 0; i < 10; ++i) print("b" + std::to_string(i)); }).detach(); std::thread([&](){ for (int i = 0; i < 10; ++i) print("c" + std::to_string(i)); }).detach(); std::thread([&](){ for (int i = 0; i < 10; ++i) print("d" + std::to_string(i)); }).detach(); std::thread([&](){ for (int i = 0; i < 10; ++i) print("e" + std::to_string(i)); }).detach(); std::this_thread::sleep_for(std::chrono::milliseconds(100)); } </code></pre> Is this safe? My doubts arise from this question, which presents a similar case.

This is not the same as the linked question for several reasons. The linked question is not C++11, but yours is. In C++11 initialization of function-local static variables is always safe. Prior to C++11 it was only safe with some compilers e.g. GCC and Clang default to thread-safe initialization. The linked question initializes the reference by calling a function, which is dynamic initialization and happens at run-time. The default constructor for <code>std::mutex</code> is <code>constexpr</code> so your static variable has constant initialization, i.e. the mutex can be initialized at compile-time (or link-time) so there is nothing to do dynamically at runtime. Even if multiple threads call the function concurrently there's nothing they actually need to do before using the mutex. Your code is safe (assuming your compiler implements the C++11 rules correctly.)

Are function-local static mutexes thread-safe?

Tags:

c++

c++11

In the following program I attempt the make the print function thread-safe by using a function-local mutex object:

#include <iostream> #include <chrono> #include <mutex> #include <string> #include <thread>   void print(const std::string & s) {         // Thread safe?     static std::mutex mtx;     std::unique_lock<std::mutex> lock(mtx);     std::cout <<s << std::endl; }   int main() {     std::thread([&](){ for (int i = 0; i < 10; ++i) print("a" + std::to_string(i)); }).detach();     std::thread([&](){ for (int i = 0; i < 10; ++i) print("b" + std::to_string(i)); }).detach();     std::thread([&](){ for (int i = 0; i < 10; ++i) print("c" + std::to_string(i)); }).detach();     std::thread([&](){ for (int i = 0; i < 10; ++i) print("d" + std::to_string(i)); }).detach();     std::thread([&](){ for (int i = 0; i < 10; ++i) print("e" + std::to_string(i)); }).detach();     std::this_thread::sleep_for(std::chrono::milliseconds(100)); }

Is this safe?

My doubts arise from this question, which presents a similar case.

277

asked Dec 31 '12 22:12

StackedCrooked

2 Answers

C++11

In C++11 and later versions: yes, this pattern is safe. In particular, initialization of function-local static variables is thread-safe, so your code above works safely across threads.

This way this works in practice is that the compiler inserts any necessary boilerplate in the function itself to check if the variable is initialized prior to access. In the case of std::mutex as implemented in gcc, clang and icc, however, the initialized state is all-zeros, so no explicit initialization is needed (the variable will live in the all-zeros .bss section so the initialization is "free"), as we see from the assembly¹:

inc(int& i):         mov     eax, OFFSET FLAT:_ZL28__gthrw___pthread_key_createPjPFvPvE         test    rax, rax         je      .L2         push    rbx         mov     rbx, rdi         mov     edi, OFFSET FLAT:_ZZ3incRiE3mtx         call    _ZL26__gthrw_pthread_mutex_lockP15pthread_mutex_t         test    eax, eax         jne     .L10         add     DWORD PTR [rbx], 1         mov     edi, OFFSET FLAT:_ZZ3incRiE3mtx         pop     rbx         jmp     _ZL28__gthrw_pthread_mutex_unlockP15pthread_mutex_t .L2:         add     DWORD PTR [rdi], 1         ret .L10:         mov     edi, eax         call    _ZSt20__throw_system_errori

Note that starting at the line mov edi, OFFSET FLAT:_ZZ3incRiE3mtx it simply loads the address of the inc::mtx function-local static and calls pthread_mutex_lock on it, without any initialization. The code before that dealing with pthread_key_create is apparently just checking if the pthreads library is present at all.

There's not guarantee, however, that all implementations will implement std::mutex as all-zeros, so you might in some cases incur ongoing overhead on each call to check if the mutex has been initialized. Declaring the mutex outside the function would avoid that.

Here's an example contrasting the two approaches with a stand-in mutex2 class with a non-inlinable constructor (so the compiler can't determine that the initial state is all-zeros):

#include <mutex>  class mutex2 {     public:     mutex2();     void lock();      void unlock();  };  void inc_local(int &i) {         // Thread safe?     static mutex2 mtx;     std::unique_lock<mutex2> lock(mtx);     i++; }  mutex2 g_mtx;  void inc_global(int &i) {         std::unique_lock<mutex2> lock(g_mtx);     i++; }

The function-local version compiles (on gcc) to:

inc_local(int& i):         push    rbx         movzx   eax, BYTE PTR _ZGVZ9inc_localRiE3mtx[rip]         mov     rbx, rdi         test    al, al         jne     .L3         mov     edi, OFFSET FLAT:_ZGVZ9inc_localRiE3mtx         call    __cxa_guard_acquire         test    eax, eax         jne     .L12 .L3:         mov     edi, OFFSET FLAT:_ZZ9inc_localRiE3mtx         call    _ZN6mutex24lockEv         add     DWORD PTR [rbx], 1         mov     edi, OFFSET FLAT:_ZZ9inc_localRiE3mtx         pop     rbx         jmp     _ZN6mutex26unlockEv .L12:         mov     edi, OFFSET FLAT:_ZZ9inc_localRiE3mtx         call    _ZN6mutex2C1Ev         mov     edi, OFFSET FLAT:_ZGVZ9inc_localRiE3mtx         call    __cxa_guard_release         jmp     .L3         mov     rbx, rax         mov     edi, OFFSET FLAT:_ZGVZ9inc_localRiE3mtx         call    __cxa_guard_abort         mov     rdi, rbx         call    _Unwind_Resume

Note the large amount of boilerplate dealing with the __cxa_guard_* functions. First, a rip-relative flag byte, _ZGVZ9inc_localRiE3mtx² is checked and if non-zero, the variable has already been initialized and we are done and fall into the fast-path. No atomic operations are needed because on x86, loads already have the needed acquire semantics.

If this check fails, we go to the slow path, which is essentially a form of double-checked locking: the initial check is not sufficient to determine that the variable needs initialization because two or more threads may be racing here. The __cxa_guard_acquire call does the locking and the second check, and may either fall through to the fast path as well (if another thread concurrently initialized the object), or may jump dwon to the actual initialization code at .L12.

Finally note that the last 5 instructions in the assembly aren't direct reachable from the function at all as they are preceded by an unconditional jmp .L3 and nothing jumps to them. They are there to be jumped to by an exception handler should the call to the constructor mutex2() throw an exception at some point.

Overall, we can say at the runtime cost of the first-access initialization is low to moderate because the fast-path only checks a single byte flag without any expensive instructions (and the remainder of the function itself usually implies at least two atomic operations for mutex.lock() and mutex.unlock(), but it comes at a significant code size increase.

Compare to the global version, which is identical except that initailization happens during global initialization rather than before first access:

inc_global(int& i):     push    rbx     mov     rbx, rdi     mov     edi, OFFSET FLAT:g_mtx     call    _ZN6mutex24lockEv     add     DWORD PTR [rbx], 1     mov     edi, OFFSET FLAT:g_mtx     pop     rbx     jmp     _ZN6mutex26unlockEv

The function is less than a third of the size without any initialization boilerplate at all.

Prior to C++11

Prior to C++11, however, this is generally not safe, unless your compiler makes some special guarantees about the way in which static locals are initialized.

Some time ago, while looking at a similar issue, I examined the assembly generated by Visual Studio for this case. The pseudocode for the generated assembly code for your print method looked something like this:

void print(const std::string & s) {         if (!init_check_print_mtx) {         init_check_print_mtx = true;         mtx.mutex();  // call mutex() ctor for mtx     }          // ... rest of method }

The init_check_print_mtx is a compiler generated global variable specific to this method which tracks whether the local static has been initialized. Note that inside the "one time" initialize block guarded by this variable, that the variable is set to true before the mutex is initialized.

I though this was silly since it ensures that other threads racing into this method will skip the initializer and use a uninitialized mtx - versus the alternative of possibly initializing mtx more than once - but in fact doing it this way allows you to avoid the infinite recursion issue that occurs if std::mutex() were to call back into print, and this behavior is in fact mandated by the standard.

Nemo above mentions that this has been fixed (more precisely, re-specified) in C++11 to require a wait for all racing threads, which would make this safe, but you'll need to check your own compiler for compliance. I didn't check if in fact the new spec includes this guarantee, but I wouldn't be at all surprised given that local statics were pretty much useless in multi-threaded environments without this (except perhaps for primitive values which didn't have any check-and-set behavior because they just referred directly to an already initialized location in the .data segment).

¹ Note that I changed the print() function to a slightly simpler inc() function that just increments an integer in the locked region. This has the same locking structure and implications as the original, but avoids a bunch of code dealing with the << operators and std::cout.

² Using c++filt this de-mangles to guard variable for inc_local(int&)::mtx.

111

answered Sep 21 '22 11:09

BeeOnRope

This is not the same as the linked question for several reasons.

The linked question is not C++11, but yours is. In C++11 initialization of function-local static variables is always safe. Prior to C++11 it was only safe with some compilers e.g. GCC and Clang default to thread-safe initialization.

The linked question initializes the reference by calling a function, which is dynamic initialization and happens at run-time. The default constructor for std::mutex is constexpr so your static variable has constant initialization, i.e. the mutex can be initialized at compile-time (or link-time) so there is nothing to do dynamically at runtime. Even if multiple threads call the function concurrently there's nothing they actually need to do before using the mutex.

Your code is safe (assuming your compiler implements the C++11 rules correctly.)

answered Sep 21 '22 11:09

Jonathan Wakely

Related questions
                            
                                Passing any function as template parameter
                            
                                Does std::sort implement Quicksort? [duplicate]
                            
                                How to build a Visual C++ Project for Linux?
                            
                                How does qobject_cast work?
                            
                                What are contracts (as proposed for C++17)?
                            
                                How to use multiple versions of GCC
                            
                                GLIBCXX versions
                            
                                visual studio project files
                            
                                How to have CMake show headers-that are not part of any binary target-in the IDE?
                            
                                When does using a std::multimap make sense
                            
                                Calling functions in a DLL from C++
                            
                                What exactly is a type cast in C/C++?
                            
                                Why is "\?" an escape sequence in C/C++?
                            
                                When and why would you use static with constexpr?
                            
                                Rarely executed and almost empty if statement drastically reduces performance in C++
                            
                                Qt - Determine absolute widget and cursor position
                            
                                Changing DPI scaling size of display make Qt application's font size get rendered bigger
                            
                                Difference between vector::begin() and std::begin()
                            
                                C++'s most vexing parse again [duplicate]
                            
                                why is the destructor call after the std::move necessary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With