I am implementing a relatively simple thread pool with Python's Queue.Queue
class. I have one producer class that contains the Queue
instance along with some convenience methods, along with a consumer class that subclasses threading.Thread
. I instantiate that object for every thread I want in my pool ("worker threads," I think they're called) based on an integer.
Each worker thread takes flag, data
off the queue, processes it using its own database connection, and places the GUID of the row onto a list so that the producer class knows when a job is done.
While I'm aware that other modules implement the functionality I'm coding, the reason I'm coding this is to gain a better understanding of how Python threading works. This brings me to my question.
If I store anything in a function's namespace or in the class's __dict__
object, will it be thread safe?
class Consumer(threading.Thread):
def __init__(self, producer, db_filename):
self.producer = producer
self.conn = sqlite3.connect(db_filename) # Is this var thread safe?
def run(self):
flag, data = self.producer.queue.get()
while flag != 'stop':
# Do stuff with data; Is `data` thread safe?
I am thinking that both would be thread safe, here's my rationale:
__dict__
gets created. Under the scenario I outline above, I don't think any other object would have a reference to this object. (Now, perhaps the situation might get more complicated if I used join()
functionality, but I'm not...)global
, so I don't understand how any other object would have a reference to a function variable.This post addresses my question somewhat, but is still a little abstract for me.
Thanks in advance for clearing this up for me.
You are right; this is thread-safe. Local variables (the ones you call "function namespace") are always thread-safe, since only the thread executing the function can access them. Instance attributes are thread-safe as long as the instance is not shared across threads. As the consumer class inherits from Thread, its instances certainly won't be shared across threads.
The only "risk" here is the value of the data object: in theory, the producer might hold onto the data object after putting it into the queue, and (if the data object itself is mutable - make sure you understand what "mutable" means) may change the object while the Consumer is using it. If the producer leaves the data object alone after putting it into the queue, this is thread-safe.
To make the data thread safe use copy.deepcopy() to create a new copy of the data before putting it on the queue. Then the producer can modify the data in the next loop without modifying the consumers copy before he gets to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With