thread_local at block scope

Tags:

What is the use of a thread_local variable at block scope?

If a compilable sample helps to illustrate the question, here it is:

#include <thread>
#include <iostream>

namespace My {
    void f(int *const p) {++*p;}
}

int main()
{
    thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n";
    return 0;
}

Output: 43

In the sample, the new thread gets its own n but (as far as I know) can do nothing interesting with it, so why bother? Does the new thread's own n have any use? And if it has no use, then what is the point?

Naturally, I assume that there is a point. I just do not know what the point might be. This is why I ask.

If the new thread's own n wants (as I suppose) special handling by the CPU at runtime—perhaps because, at the machine-code level, one cannot access the own n in the normal way via a precalculated offset from the base pointer of the new thread's stack—then are we not merely wasting machine cycles and electricity for no gain? And yet even if special handling were not required, still no gain! Not that I can see.

So why thread_local at block scope, please?

References

Cppreference on thread_local and other storage classes
An earlier question: when exactly is a thread_local variable declared at global scope initialized?
Another earlier question: thread_local variables initialization
Yet another earlier question: the cost of thread_local

709

asked Mar 15 '19 21:03

thb

2 Answers

I find thread_local is only useful in three cases:

If you need each thread to have a unique resource so that they don't have to share, mutex, etc. for using said resource. And even so, this is only useful if the resource is large and/or expensive to create or needs to persist across function invocations (i.e. a local variable inside the function will not suffice).
An offshoot of (1) - you may need special logic to run when a calling thread eventually terminates. For this, you can use the destructor of the thread_local object created in the function. The destructor of such a thread_local object is called once for each thread that entered the code block with the thread_local declaration (at the end of the thread's lifetime).
You may need some other logic to be performed for each unique thread that calls it, but only once. For instance, you could write a function that registers each unique thread that called a function. This may sound bizarre, but I've found uses for this in managing garbage-collected resources in a library I'm developing. This usage is closely-related to (1) but doesn't get used after its construction. Effectively a sentry object for a thread's entire lifetime.

177

answered Oct 03 '22 18:10

Cruz Jean

First note that a block-local thread-local is implicitly static thread_local. In other words, your example code is equivalent to:

Click to copy

int main()
{
    static thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

Variables declared with thread_local inside a function are not so different from globally defined thread_locals. In both cases, you create an object that is unique per thread and whose lifetime is bound to the lifetime of the thread.

The difference is only that globally defined thread_locals will be initialized when the new thread is run before you enter any thread-specific functions. In contrast, a block-local thread-local variable is initialized the first time control passes through its declaration.

A use case would be to speed up a function by defining a local cache that is reused during the lifetime of the thread:

Click to copy

void foo() {
  static thread_local MyCache cache;
  // ...
}

(I used static thread_local here to make it explicit that the cache will be reused if the function is executed multiple times within the same thread, but it is a matter of taste. If you drop the static, it will not make any difference.)

A comment about your the example code. Maybe it was intentional, but the thread is not really accessing the thread_local n. Instead it operates on a copy of a pointer, which was created by the thread running main. Because of that both threads refer to the same memory.

In other words, a more verbose way would have been:

Click to copy

int main()
{
    thread_local int n {42};
    int* n_ = &n;
    std::thread t(My::f, n_);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

If you change the code, so the thread accesses n, it will operate on its own version, and n belonging to the main thread will not be modified:

Click to copy

int main()
{
    thread_local int n {42};
    std::thread t([&] { My::f(&n); });
    t.join();
    std::cout << n << "\n"; // prints 42 (not 43)
    return 0;
}

Here is a more complicated example. It calls the function two times to show that the state is preserved between the calls. Also its output shows that the threads operate on their own state:

Click to copy

#include <iostream>
#include <thread>

void foo() {
  thread_local int n = 1;
  std::cout << "n=" << n << " (main)" << std::endl;
  n = 100;
  std::cout << "n=" << n << " (main)" << std::endl;
  int& n_ = n;
  std::thread t([&] {
          std::cout << "t executing...\n";
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          std::cout << "t executing...DONE" << std::endl;
        });
  t.join();
  std::cout << "n=" << n << " (main, after t.join())\n";
  n = 200;
  std::cout << "n=" << n << " (main)" << std::endl;

  std::thread t2([&] {
          std::cout << "t2 executing...\n";
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          std::cout << "t2 executing...DONE" << std::endl;
        });
  t2.join();
  std::cout << "n=" << n << " (main, after t2.join())" << std::endl;
}

int main() {
  foo();
  std::cout << "---\n";
  foo();
  return 0;
}

Output:

Click to copy

n=1 (main)
n=100 (main)
t executing...
n=1 (thread 1)      # the thread used the "n = 1" init code
n_=100 (thread 1)   # the passed reference, not the thread_local
n=2 (thread 1)      # write to the thread_local
n_=100 (thread 1)   # did not change the passed reference
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())
---
n=200 (main)        # second execution: old state is reused
n=100 (main)
t executing...
n=1 (thread 1)
n_=100 (thread 1)
n=2 (thread 1)
n_=100 (thread 1)
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())

answered Oct 03 '22 17:10

Philipp Claßen

Related questions
                            
                                Is atomic_thread_fence(memory_order_release) different from using memory_order_acq_rel?
                            
                                Will C++ always prefer an rvalue reference conversion operator over const lvalue reference when possible?
                            
                                Deleting a std::function in the middle of invocation
                            
                                Why is library API + compiler ABI enough to ensure compatibility between objects with different versions of gcc?
                            
                                What is the equivalent of dynamic_cast in Delphi?
                            
                                What are curly braces in hash function?
                            
                                Opposite of friend declaration
                            
                                Is it possible to change behavior of function based on scope?
                            
                                unpredictable number of nested loops
                            
                                How to catch undefined behaviour in function argument initialization
                            
                                Real world usage example for spaceship operator [closed]
                            
                                Slow vpermpd instruction being generated; why?
                            
                                What is the purpose of a placeholder type in a trailing-return-type?
                            
                                Initializing char array from constant
                            
                                Zero-cost lists for inline functions in c++
                            
                                Difference between templates and two separate classes
                            
                                What is the rule for choosing "central widget" in QMainWindow? and why is it important?
                            
                                Why can't a function in a namespace see my operator<< defined globally?
                            
                                Overriding multiple virtual functions in a variadic class template
                            
                                Why can delete operator be used in const context?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

thread_local at block scope

Tags:

c++

multithreading

thread-local-storage

thb

People also ask

2 Answers

Cruz Jean

Philipp Claßen

Recent Activity

Donate For Us