I'm working on a fairly large project in Python that requires one of the compute-intensive background tasks to be offloaded to another core, so that the main service isn't slowed down. I've come across some apparently strange behaviour when using multiprocessing.Queue
to communicate results from the worker process. Using the same queue for both a threading.Thread
and a multiprocessing.Process
for comparison purposes, the thread works just fine but the process fails to join after putting a large item in the queue. Observe:
import threading
import multiprocessing
class WorkerThread(threading.Thread):
def __init__(self, queue, size):
threading.Thread.__init__(self)
self.queue = queue
self.size = size
def run(self):
self.queue.put(range(size))
class WorkerProcess(multiprocessing.Process):
def __init__(self, queue, size):
multiprocessing.Process.__init__(self)
self.queue = queue
self.size = size
def run(self):
self.queue.put(range(size))
if __name__ == "__main__":
size = 100000
queue = multiprocessing.Queue()
worker_t = WorkerThread(queue, size)
worker_p = WorkerProcess(queue, size)
worker_t.start()
worker_t.join()
print 'thread results length:', len(queue.get())
worker_p.start()
worker_p.join()
print 'process results length:', len(queue.get())
I've seen that this works fine for size = 10000
, but hangs at worker_p.join()
for size = 100000
. Is there some inherent size limit to what multiprocessing.Process
instances can put in a multiprocessing.Queue
? Or am I making some obvious, fundamental mistake here?
For reference, I am using Python 2.6.5 on Ubuntu 10.04.
To answer your 1st question: for all intents and purposes, the max size of a Queue is infinite. The reason why is that if you try to put something in a Queue that is full, it will wait until a slot has opened up before it puts the next item into the queue.
In other words, Multiprocess queue is pretty slow putting and getting individual data, then QuickQueue wrap several data in one list, this list is one single data that is enqueue in the queue than is more quickly than put one individual data.
A queue is a data structure on which items can be added by a call to put() and from which items can be retrieved by a call to get(). The multiprocessing. Queue provides a first-in, first-out FIFO queue, which means that the items are retrieved from the queue in the order they were added.
Yes, it is. From https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes: Queues are thread and process safe.
Seems the underlying pipe is full, so the feeder thread blocks on the write to the pipe (actually when trying to acquire the lock protecting the pipe from concurrent access).
Check this issue http://bugs.python.org/issue8237
The answer to python multiprocessing: some functions do not return when they are complete (queue material too big) implements what you probably mean by "dequeuing" before joining" in a parallel execution of an arbitrary set of functions, whose return values get queued.
This therefore allows any size of stuff to get put into the queues, so that the limit you found does not get in the way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With