I just ran across this technique for running code once per thread. I don't know how it works at the lowest level though. Especially, what's fs
pointing to? What does .zero 8
mean? Is there a reason the identifier is @tpoff
?
int foo();
void bar()
{
thread_local static auto _ = foo();
}
Output (with -O2):
bar():
cmp BYTE PTR fs:guard variable for bar()::_@tpoff, 0
je .L8
ret
.L8:
sub rsp, 8
call foo()
mov BYTE PTR fs:guard variable for bar()::_@tpoff, 1
add rsp, 8
ret
guard variable for bar()::_:
.zero 8
The compiler isn't multi-threaded, as there isn't much potential for concurrency there (it's a sequential job mostly, you can only do one step at a time, and the individual steps usually aren't suited to be processed in parallel).
The __thread storage class marks a static variable as having thread-local storage duration. This means that, in a multi-threaded application, a unique instance of the variable is created for each thread that uses it, and destroyed when the thread terminates.
The default is -fno-common , which specifies that the compiler places uninitialized global variables in the BSS section of the object file.
The fs
segment base is the address of thread-local storage (on x86-64 Linux at least).
.zero 8
reserves 8 bytes of zeros (presumably in the BSS). Check the GAS manual: https://sourceware.org/binutils/docs/as/Zero.html, links in https://stackoverflow.com/tags/x86/info.
@tpoff
presumably means to address it relative to thread-local storage, probably stands for thread something offset, I don't know.
The rest of it looks similar to what gcc normally does for static
local variables that need a runtime initializer: a guard variable that it checks every time it enters the function, falling through in the already-initialized case.
The 1-byte guard variable is in thread-local storage. The actual _
itself is optimized away because it's never read. Notice there's no store of eax
after foo
returns.
BTW, _
is a weird (bad) choice for a variable name. Easy to miss it, and probably reserved for use by the implementation.
It has a nice optimization here: normally (for non-thread-local static int var = foo();
) if it finds the guard variable isn't already initialized, it needs a thread-safe way to make sure only one thread actually does the initialization (essentially taking a lock).
But here each thread has its own guard variable (and should run foo()
the first time regardless of what other threads are doing) so it doesn't need to call a run_once
function to get mutual exclusion.
(sorry for the short answer, I may expand this later with an example on https://godbolt.org/ of a non-thread-local static
local variable. Or find an SO Q&A about it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With