How do I use thread local storage in Python?
Thread-local data is data whose values are thread specific. To manage thread-local data, just create an instance of local (or a subclass) and store attributes on it: mydata = threading.local() mydata.x = 1. The instance's values will be different for separate threads.
Thread local can be considered as a scope of access like session scope or request scope. In thread local, you can set any object and this object will be local and global to the specific thread which is accessing this object. Java ThreadLocal class provides thread-local variables.
With thread local storage (TLS), you can provide unique data for each thread that the process can access using a global index. One thread allocates the index, which can be used by the other threads to retrieve the unique data associated with the index.
Threads in python are an entity within a process that can be scheduled for execution. In simpler words, a thread is a computation process that is to be performed by a computer. It is a sequence of such instructions within a program that can be executed independently of other codes.
Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading
module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing
module creates a new sub-process for each, so any global will be thread local.
Here is a simple example:
import threading from threading import current_thread threadLocal = threading.local() def hi(): initialized = getattr(threadLocal, 'initialized', None) if initialized is None: print("Nice to meet you", current_thread().name) threadLocal.initialized = True else: print("Welcome back", current_thread().name) hi(); hi()
This will print out:
Nice to meet you MainThread Welcome back MainThread
One important thing that is easily overlooked: a threading.local()
object only needs to be created once, not once per thread nor once per function call. The global
or class
level are ideal locations.
Here is why: threading.local()
actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local()
multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal
variable (or whatever it is called), it gets its own private view of that variable.
This won't work as intended:
import threading from threading import current_thread def wont_work(): threadLocal = threading.local() #oops, this creates a new dict each time! initialized = getattr(threadLocal, 'initialized', None) if initialized is None: print("First time for", current_thread().name) threadLocal.initialized = True else: print("Welcome back", current_thread().name) wont_work(); wont_work()
Will result in this output:
First time for MainThread First time for MainThread
All global variables are thread local, since the multiprocessing
module creates a new process for each thread.
Consider this example, where the processed
counter is an example of thread local storage:
from multiprocessing import Pool from random import random from time import sleep import os processed=0 def f(x): sleep(random()) global processed processed += 1 print("Processed by %s: %s" % (os.getpid(), processed)) return x*x if __name__ == '__main__': pool = Pool(processes=4) print(pool.map(f, range(10)))
It will output something like this:
Processed by 7636: 1 Processed by 9144: 1 Processed by 5252: 1 Processed by 7636: 2 Processed by 6248: 1 Processed by 5252: 2 Processed by 6248: 2 Processed by 9144: 2 Processed by 7636: 3 Processed by 5252: 3 [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
... of course, the thread IDs and the counts for each and order will vary from run to run.
Thread-local storage can simply be thought of as a namespace (with values accessed via attribute notation). The difference is that each thread transparently gets its own set of attributes/values, so that one thread doesn't see the values from another thread.
Just like an ordinary object, you can create multiple threading.local
instances in your code. They can be local variables, class or instance members, or global variables. Each one is a separate namespace.
Here's a simple example:
import threading class Worker(threading.Thread): ns = threading.local() def run(self): self.ns.val = 0 for i in range(5): self.ns.val += 1 print("Thread:", self.name, "value:", self.ns.val) w1 = Worker() w2 = Worker() w1.start() w2.start() w1.join() w2.join()
Output:
Thread: Thread-1 value: 1 Thread: Thread-2 value: 1 Thread: Thread-1 value: 2 Thread: Thread-2 value: 2 Thread: Thread-1 value: 3 Thread: Thread-2 value: 3 Thread: Thread-1 value: 4 Thread: Thread-2 value: 4 Thread: Thread-1 value: 5 Thread: Thread-2 value: 5
Note how each thread maintains its own counter, even though the ns
attribute is a class member (and hence shared between the threads).
The same example could have used an instance variable or a local variable, but that wouldn't show much, as there's no sharing then (a dict would work just as well). There are cases where you'd need thread-local storage as instance variables or local variables, but they tend to be relatively rare (and pretty subtle).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With