How do I use thread local storage in Python? <h3>Related</h3> <ul> <li> What is “thread local storage” in Python, and why do I need it? - This thread appears to be focused more on when variables are shared.</li> <li> Efficient way to determine whether a particular function is on the stack in Python - Alex Martelli gives a nice solution</li> </ul>

Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the <code>threading</code> module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different <code>multiprocessing</code> module creates a new sub-process for each, so any global will be thread local. <h3>threading module</h3> Here is a simple example: <pre class="prettyprint"><code>import threading from threading import current_thread threadLocal = threading.local() def hi(): initialized = getattr(threadLocal, 'initialized', None) if initialized is None: print("Nice to meet you", current_thread().name) threadLocal.initialized = True else: print("Welcome back", current_thread().name) hi(); hi() </code></pre> This will print out: <pre class="prettyprint"><code>Nice to meet you MainThread Welcome back MainThread </code></pre> One important thing that is easily overlooked: a <code>threading.local()</code> object only needs to be created once, not once per thread nor once per function call. The <code>global</code> or <code>class</code> level are ideal locations. Here is why: <code>threading.local()</code> actually creates a new instance each time it is called (just like any factory or class call would), so calling <code>threading.local()</code> multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing <code>threadLocal</code> variable (or whatever it is called), it gets its own private view of that variable. This won't work as intended: <pre class="prettyprint"><code>import threading from threading import current_thread def wont_work(): threadLocal = threading.local() #oops, this creates a new dict each time! initialized = getattr(threadLocal, 'initialized', None) if initialized is None: print("First time for", current_thread().name) threadLocal.initialized = True else: print("Welcome back", current_thread().name) wont_work(); wont_work() </code></pre> Will result in this output: <pre class="prettyprint"><code>First time for MainThread First time for MainThread </code></pre> <h3>multiprocessing module</h3> All global variables are thread local, since the <code>multiprocessing</code> module creates a new process for each thread. Consider this example, where the <code>processed</code> counter is an example of thread local storage: <pre class="prettyprint"><code>from multiprocessing import Pool from random import random from time import sleep import os processed=0 def f(x): sleep(random()) global processed processed += 1 print("Processed by %s: %s" % (os.getpid(), processed)) return x*x if __name__ == '__main__': pool = Pool(processes=4) print(pool.map(f, range(10))) </code></pre> It will output something like this: <pre class="prettyprint"><code>Processed by 7636: 1 Processed by 9144: 1 Processed by 5252: 1 Processed by 7636: 2 Processed by 6248: 1 Processed by 5252: 2 Processed by 6248: 2 Processed by 9144: 2 Processed by 7636: 3 Processed by 5252: 3 [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] </code></pre> ... of course, the thread IDs and the counts for each and order will vary from run to run.

Thread-local storage can simply be thought of as a namespace (with values accessed via attribute notation). The difference is that each thread transparently gets its own set of attributes/values, so that one thread doesn't see the values from another thread. Just like an ordinary object, you can create multiple <code>threading.local</code> instances in your code. They can be local variables, class or instance members, or global variables. Each one is a separate namespace. Here's a simple example: <pre class="prettyprint"><code>import threading class Worker(threading.Thread): ns = threading.local() def run(self): self.ns.val = 0 for i in range(5): self.ns.val += 1 print("Thread:", self.name, "value:", self.ns.val) w1 = Worker() w2 = Worker() w1.start() w2.start() w1.join() w2.join() </code></pre> Output: <pre class="prettyprint"><code>Thread: Thread-1 value: 1 Thread: Thread-2 value: 1 Thread: Thread-1 value: 2 Thread: Thread-2 value: 2 Thread: Thread-1 value: 3 Thread: Thread-2 value: 3 Thread: Thread-1 value: 4 Thread: Thread-2 value: 4 Thread: Thread-1 value: 5 Thread: Thread-2 value: 5 </code></pre> Note how each thread maintains its own counter, even though the <code>ns</code> attribute is a class member (and hence shared between the threads). The same example could have used an instance variable or a local variable, but that wouldn't show much, as there's no sharing then (a dict would work just as well). There are cases where you'd need thread-local storage as instance variables or local variables, but they tend to be relatively rare (and pretty subtle).

Thread local storage in Python

2 Answers

Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing module creates a new sub-process for each, so any global will be thread local.

threading module

Here is a simple example:

import threading from threading import current_thread  threadLocal = threading.local()  def hi():     initialized = getattr(threadLocal, 'initialized', None)     if initialized is None:         print("Nice to meet you", current_thread().name)         threadLocal.initialized = True     else:         print("Welcome back", current_thread().name)  hi(); hi()

This will print out:

Nice to meet you MainThread Welcome back MainThread

One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call. The global or class level are ideal locations.

Here is why: threading.local() actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local() multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal variable (or whatever it is called), it gets its own private view of that variable.

This won't work as intended:

import threading from threading import current_thread  def wont_work():     threadLocal = threading.local() #oops, this creates a new dict each time!     initialized = getattr(threadLocal, 'initialized', None)     if initialized is None:         print("First time for", current_thread().name)         threadLocal.initialized = True     else:         print("Welcome back", current_thread().name)  wont_work(); wont_work()

Will result in this output:

First time for MainThread First time for MainThread

multiprocessing module

All global variables are thread local, since the multiprocessing module creates a new process for each thread.

Consider this example, where the processed counter is an example of thread local storage:

from multiprocessing import Pool from random import random from time import sleep import os  processed=0  def f(x):     sleep(random())     global processed     processed += 1     print("Processed by %s: %s" % (os.getpid(), processed))     return x*x  if __name__ == '__main__':     pool = Pool(processes=4)     print(pool.map(f, range(10)))

It will output something like this:

Processed by 7636: 1 Processed by 9144: 1 Processed by 5252: 1 Processed by 7636: 2 Processed by 6248: 1 Processed by 5252: 2 Processed by 6248: 2 Processed by 9144: 2 Processed by 7636: 3 Processed by 5252: 3 [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

... of course, the thread IDs and the counts for each and order will vary from run to run.

113

answered Oct 11 '22 22:10

mbells

Thread-local storage can simply be thought of as a namespace (with values accessed via attribute notation). The difference is that each thread transparently gets its own set of attributes/values, so that one thread doesn't see the values from another thread.

Just like an ordinary object, you can create multiple threading.local instances in your code. They can be local variables, class or instance members, or global variables. Each one is a separate namespace.

Here's a simple example:

import threading  class Worker(threading.Thread):     ns = threading.local()     def run(self):         self.ns.val = 0         for i in range(5):             self.ns.val += 1             print("Thread:", self.name, "value:", self.ns.val)  w1 = Worker() w2 = Worker() w1.start() w2.start() w1.join() w2.join()

Output:

Thread: Thread-1 value: 1 Thread: Thread-2 value: 1 Thread: Thread-1 value: 2 Thread: Thread-2 value: 2 Thread: Thread-1 value: 3 Thread: Thread-2 value: 3 Thread: Thread-1 value: 4 Thread: Thread-2 value: 4 Thread: Thread-1 value: 5 Thread: Thread-2 value: 5

Note how each thread maintains its own counter, even though the ns attribute is a class member (and hence shared between the threads).

The same example could have used an instance variable or a local variable, but that wouldn't show much, as there's no sharing then (a dict would work just as well). There are cases where you'd need thread-local storage as instance variables or local variables, but they tend to be relatively rare (and pretty subtle).

answered Oct 11 '22 23:10

Paul Moore

Related questions
                            
                                What does this Django regular expression mean? `?P`
                            
                                Partition array into N chunks with Numpy
                            
                                How to delete all instances of a character in a string in python?
                            
                                load csv into 2D matrix with numpy for plotting
                            
                                Floor or ceiling of a pandas series in python?
                            
                                Populating a dictionary using for loops (python) [duplicate]
                            
                                Get absolute paths of all files in a directory
                            
                                Activating Anaconda Environment in VsCode
                            
                                substring of an entire column in pandas dataframe
                            
                                Understanding NumPy's Convolve
                            
                                How to write Unix end of line characters in Windows?
                            
                                How to calculate the inverse of the normal cumulative distribution function in python?
                            
                                Is there a simple way to increment a datetime object one month in Python? [duplicate]
                            
                                How to check if a datetime object is localized with pytz?
                            
                                What’s the difference between a project and an app in Django world?
                            
                                How to reverse tuples in Python? [duplicate]
                            
                                Pandas plot() without a legend
                            
                                Putting newline in matplotlib label with TeX in Python?
                            
                                Changing file permission in Python
                            
                                Protobuf to json in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Thread local storage in Python

Tags:

python

multithreading

thread-local-storage

Related

Casebash

People also ask