Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

thread_local at block scope

What is the use of a thread_local variable at block scope?

If a compilable sample helps to illustrate the question, here it is:

#include <thread>
#include <iostream>

namespace My {
    void f(int *const p) {++*p;}
}

int main()
{
    thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n";
    return 0;
}

Output: 43

In the sample, the new thread gets its own n but (as far as I know) can do nothing interesting with it, so why bother? Does the new thread's own n have any use? And if it has no use, then what is the point?

Naturally, I assume that there is a point. I just do not know what the point might be. This is why I ask.

If the new thread's own n wants (as I suppose) special handling by the CPU at runtime—perhaps because, at the machine-code level, one cannot access the own n in the normal way via a precalculated offset from the base pointer of the new thread's stack—then are we not merely wasting machine cycles and electricity for no gain? And yet even if special handling were not required, still no gain! Not that I can see.

So why thread_local at block scope, please?

References

  • Cppreference on thread_local and other storage classes
  • An earlier question: when exactly is a thread_local variable declared at global scope initialized?
  • Another earlier question: thread_local variables initialization
  • Yet another earlier question: the cost of thread_local
like image 709
thb Avatar asked Mar 15 '19 21:03

thb


People also ask

Can ThreadLocal be static?

ThreadLocal instances are typically private static fields in classes that wish to associate state with a thread (e.g., a user ID or Transaction ID).

What is __ thread in C?

The __thread storage class marks a static variable as having thread-local storage duration. This means that in a multi-threaded application a unique instance of the variable is created for each thread that uses it and destroyed when the thread terminates.

How does thread_ local work?

In C++, thread_local is defined as a specifier to define the thread-local data and this data is created when the thread is created and destroyed when the thread is also destroyed, hence this thread-local data is known as thread-local storage.

What is thread local storage used for?

Thread Local Storage (TLS) is the mechanism by which each thread in a given multithreaded process allocates storage for thread-specific data. In standard multithreaded programs, data is shared among all threads of a given process, whereas thread local storage is the mechanism for allocating per-thread data.


2 Answers

I find thread_local is only useful in three cases:

  1. If you need each thread to have a unique resource so that they don't have to share, mutex, etc. for using said resource. And even so, this is only useful if the resource is large and/or expensive to create or needs to persist across function invocations (i.e. a local variable inside the function will not suffice).

  2. An offshoot of (1) - you may need special logic to run when a calling thread eventually terminates. For this, you can use the destructor of the thread_local object created in the function. The destructor of such a thread_local object is called once for each thread that entered the code block with the thread_local declaration (at the end of the thread's lifetime).

  3. You may need some other logic to be performed for each unique thread that calls it, but only once. For instance, you could write a function that registers each unique thread that called a function. This may sound bizarre, but I've found uses for this in managing garbage-collected resources in a library I'm developing. This usage is closely-related to (1) but doesn't get used after its construction. Effectively a sentry object for a thread's entire lifetime.

like image 177
Cruz Jean Avatar answered Oct 03 '22 18:10

Cruz Jean


First note that a block-local thread-local is implicitly static thread_local. In other words, your example code is equivalent to:

int main()
{
    static thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

Variables declared with thread_local inside a function are not so different from globally defined thread_locals. In both cases, you create an object that is unique per thread and whose lifetime is bound to the lifetime of the thread.

The difference is only that globally defined thread_locals will be initialized when the new thread is run before you enter any thread-specific functions. In contrast, a block-local thread-local variable is initialized the first time control passes through its declaration.

A use case would be to speed up a function by defining a local cache that is reused during the lifetime of the thread:

void foo() {
  static thread_local MyCache cache;
  // ...
}

(I used static thread_local here to make it explicit that the cache will be reused if the function is executed multiple times within the same thread, but it is a matter of taste. If you drop the static, it will not make any difference.)


A comment about your the example code. Maybe it was intentional, but the thread is not really accessing the thread_local n. Instead it operates on a copy of a pointer, which was created by the thread running main. Because of that both threads refer to the same memory.

In other words, a more verbose way would have been:

int main()
{
    thread_local int n {42};
    int* n_ = &n;
    std::thread t(My::f, n_);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

If you change the code, so the thread accesses n, it will operate on its own version, and n belonging to the main thread will not be modified:

int main()
{
    thread_local int n {42};
    std::thread t([&] { My::f(&n); });
    t.join();
    std::cout << n << "\n"; // prints 42 (not 43)
    return 0;
}

Here is a more complicated example. It calls the function two times to show that the state is preserved between the calls. Also its output shows that the threads operate on their own state:

#include <iostream>
#include <thread>

void foo() {
  thread_local int n = 1;
  std::cout << "n=" << n << " (main)" << std::endl;
  n = 100;
  std::cout << "n=" << n << " (main)" << std::endl;
  int& n_ = n;
  std::thread t([&] {
          std::cout << "t executing...\n";
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          std::cout << "t executing...DONE" << std::endl;
        });
  t.join();
  std::cout << "n=" << n << " (main, after t.join())\n";
  n = 200;
  std::cout << "n=" << n << " (main)" << std::endl;

  std::thread t2([&] {
          std::cout << "t2 executing...\n";
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          std::cout << "t2 executing...DONE" << std::endl;
        });
  t2.join();
  std::cout << "n=" << n << " (main, after t2.join())" << std::endl;
}

int main() {
  foo();
  std::cout << "---\n";
  foo();
  return 0;
}

Output:

n=1 (main)
n=100 (main)
t executing...
n=1 (thread 1)      # the thread used the "n = 1" init code
n_=100 (thread 1)   # the passed reference, not the thread_local
n=2 (thread 1)      # write to the thread_local
n_=100 (thread 1)   # did not change the passed reference
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())
---
n=200 (main)        # second execution: old state is reused
n=100 (main)
t executing...
n=1 (thread 1)
n_=100 (thread 1)
n=2 (thread 1)
n_=100 (thread 1)
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())
like image 24
Philipp Claßen Avatar answered Oct 03 '22 17:10

Philipp Claßen