How does the gcc `__thread` work?

People also ask

How thread local storage works?

With thread local storage (TLS), you can provide unique data for each thread that the process can access using a global index. One thread allocates the index, which can be used by the other threads to retrieve the unique data associated with the index.

What is __ thread in C?

The __thread storage class marks a static variable as having thread-local storage duration. This means that, in a multi-threaded application, a unique instance of the variable is created for each thread that uses it, and destroyed when the thread terminates.

What is thread-local in C++?

Introduction to C++ thread_local. In C++, thread_local is defined as a specifier to define the thread-local data and this data is created when the thread is created and destroyed when the thread is also destroyed, hence this thread-local data is known as thread-local storage.

Recent GCC, e.g. GCC 5 do support C11 and its thread_local (if compiling with e.g. gcc -std=c11). As FUZxxl commented, you could use (instead of C11 thread_local) the __thread qualifier supported by older GCC versions. Read about Thread Local Storage.

pthread_getspecific is indeed quite slow (it is in the POSIX library, so is not provided by GCC but e.g. by GNU glibc or musl-libc) since it involves a function call. Using thread_local variables will very probably be faster.

Look into the source code of MUSL's thread/pthread_getspecific.c file for an example of implementation. Read this answer to a related question.

And _thread & thread_local are (often) not magically translated to calls to pthread_getspecific. They usually involve some specific address mode and/or register (details are implementation specific, related to the ABI; on Linux, I guess that since x86-64 has more registers & address modes, its implementation of TLS is faster than on i386), with help from the compiler, the linker and the runtime system. It could happen on the contrary that some implementations of pthread_getspecific are using some internal thread_local variables (in your implementation of POSIX threads).

As an example, compiling the following code

#include <pthread.h>

const extern pthread_key_t key;

__thread int data;

int
get_data (void) {
  return data;
}

int
get_by_key (void) {
  return *(int*) (pthread_getspecific (key));
}

using GCC 5.2 (on Debian/Sid) with gcc -m32 -S -O2 -fverbose-asm gives the following code for get_data using TLS:

  .type get_data, @function
get_data:
.LFB3:
  .cfi_startproc
  movl  %gs:data@ntpoff, %eax   # data,
  ret
.cfi_endproc

and the following code of get_by_key with an explicit call to pthread_getspecific:

get_by_key:
 .LFB4:
  .cfi_startproc
  subl  $24, %esp   #,
  .cfi_def_cfa_offset 28
  pushl key # key
  .cfi_def_cfa_offset 32
  call  pthread_getspecific #
  movl  (%eax), %eax    # MEM[(int *)_4], MEM[(int *)_4]
  addl  $28, %esp   #,
  .cfi_def_cfa_offset 4
  ret
  .cfi_endproc

Hence using TLS with __thread (or thread_local in C11) should probably be faster than using pthread_getspecific (avoiding the overhead of a call).

Notice that thread_local is a convenience macro defined in <threads.h> (a C11 standard header).

gcc's __thread has exactly the same semantic as C11's _Thread_local. You don't tell us what platform you are programming for as the implementation details vary between platforms. For example, on x86 Linux, gcc should compile access to thread local variables as memory instructions with a %fs segment prefix instead of invoking pthread_getspecific.

Related questions
                            
                                Implementation of C lower_bound
                            
                                What exactly does fork return?
                            
                                faster equivalent of gettimeofday
                            
                                Fast Arc Cos algorithm?
                            
                                Is the code "while(condition);" valid and what does it mean?
                            
                                What type of programs are best written in C [closed]
                            
                                Formal methods in C++ for safety critical software
                            
                                Why is compiler generating 4-byte load instead of 1-byte load where the wider load may access unmapped data?
                            
                                Cast performance from size_t to double
                            
                                Performance degradation of matrix multiplication of single vs double precision arrays on multi-core machine
                            
                                Why is argv parameter to execvp not const?
                            
                                CPU Flame Graphs for Python
                            
                                How to call exported kernel module functions from another module?
                            
                                Forcing GCC to perform loop unswitching of memcpy runtime size checks?
                            
                                Time zone conversion C API on Linux, anyone?
                            
                                How to capture the screen with the "Tool Tips"?
                            
                                Safer way to expose a C-allocated memory buffer using numpy/ctypes?
                            
                                How does GCC implement variable-length arrays?
                            
                                Graphics driver "hello world" example? [closed]
                            
                                Format C/C++ code on save

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the gcc `__thread` work?

Tags:

c

multithreading

gcc

thread-local-storage

People also ask

Recent Activity

Donate For Us