I wrote a script that has multiple threads (created with threading.Thread
) fetching URLs from a Queue
using queue.get_nowait()
, and then processing the HTML. I am new to multi-threaded programming, and am having trouble understanding the purpose of the queue.task_done()
function.
When the Queue
is empty, it automatically returns the queue.Empty
exception. So I don't understand the need for each thread to call the task_done()
function. We know that we're done with the queue when its empty, so why do we need to notify it that the worker threads have finished their work (which has nothing to do with the queue, after they've gotten the URL from it)?
Could someone provide me with a code example (ideally using urllib
, file I/O, or something other than fibonacci numbers and printing "Hello") that shows me how this function would be used in practical applications?
Queue. task_done () Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.
task_done lets workers say when a task is done. Someone waiting for all the work to be done with Queue. join will wait until enough task_done calls have been made, not when the queue is empty.
To get the length of a queue in Python:Use the len() function to get the length of a deque object. Use the qsize() method to get the length of a queue object.
Although asyncio queues are not thread-safe, they are designed to be used specifically in async/await code.
Queue.task_done
is not there for the workers' benefit. It is there to support Queue.join
.
If I give you a box of work assignments, do I care about when you've taken everything out of the box?
No. I care about when the work is done. Looking at an empty box doesn't tell me that. You and 5 other guys might still be working on stuff you took out of the box.
Queue.task_done
lets workers say when a task is done. Someone waiting for all the work to be done with Queue.join
will wait until enough task_done
calls have been made, not when the queue is empty.
eigenfield points out in the comments that it seems really weird for a queue to have task_done
/join
methods. That's true, but it's really a naming problem. The queue
module has bad name choices that make it sound like a general-purpose queue library, when it's really a thread communication library.
It'd be weird for a general-purpose queue to have task_done
/join
methods, but it's entirely reasonable for an inter-thread message channel to have a way to indicate that messages have been processed. If the class was called thread_communication.MessageChannel
instead of queue.Queue
and task_done
was called message_processed
, the intent would be a lot clearer.
(If you need a general-purpose queue rather than an inter-thread message channel, use collections.deque
.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With