Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing.Queue vs multiprocessing.manager().Queue()

I have a simple task like that:

def worker(queue):     while True:         try:             _ = queue.get_nowait()         except Queue.Empty:             break  if __name__ == '__main__':     manager = multiprocessing.Manager()     # queue = multiprocessing.Queue()     queue = manager.Queue()      for i in range(5):         queue.put(i)      processes = []      for i in range(2):         proc = multiprocessing.Process(target=worker, args=(queue,))         processes.append(proc)         proc.start()      for proc in processes:         proc.join() 

It seems that multiprocessing.Queue can do all work that i needed, but on the other hand I see many examples of manager().Queue() and can't understand what I really need. Looks like Manager().Queue() use some sort of proxy objects, but I doesn't understand those purpose, because multiprocessing.Queue() do the same work without any proxy objects.

So, my questions is:

1) What really difference between multiprocessing.Queue and object returned by multiprocessing.manager().Queue()?

2) What do I need to use?

like image 205
novicef Avatar asked Apr 16 '17 16:04

novicef


People also ask

What is Python multiprocessing queue?

Python Multiprocessing modules provides Queue class that is exactly a First-In-First-Out data structure. They can store any pickle Python object (though simple ones are best) and are extremely useful for sharing data between processes.

What is multiprocessing manager Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

Is Python multiprocessing queue thread safe?

Yes, it is. From https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes: Queues are thread and process safe.

Is multiprocessing queue slow?

In other words, Multiprocess queue is pretty slow putting and getting individual data, then QuickQueue wrap several data in one list, this list is one single data that is enqueue in the queue than is more quickly than put one individual data.


1 Answers

Though my understanding is limited about this subject, from what I did I can tell there is one main difference between multiprocessing.Queue() and multiprocessing.Manager().Queue():

  • multiprocessing.Queue() is an object whereas multiprocessing.Manager().Queue() is an address (proxy) pointing to shared queue managed by the multiprocessing.Manager() object.
  • therefore you can't pass normal multiprocessing.Queue() objects to Pool methods, because it can't be pickled.
  • Moreover the python doc tells us to pay particular attention when using multiprocessing.Queue() because it can have undesired effects

Note When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences which are a little surprising, but should not cause any practical difficulties – if they really bother you then you can instead use a queue created with a manager. After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising Queue.Empty. If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other.

Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe. This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children. Note that a queue created using a manager does not have this issue.

There is a workaround to use multiprocessing.Queue() with Pool by setting the queue as a global variable and setting it for all processes at initialization :

queue = multiprocessing.Queue() def initialize_shared(q):     global queue     queue=q  pool= Pool(nb_process,initializer=initialize_shared, initargs(queue,)) 

will create pool processes with correctly shared queues but we can argue that the multiprocessing.Queue() objects were not created for this use.

On the other hand the manager.Queue() can be shared between pool subprocesses by passing it as normal argument of a function.

In my opinion, using multiprocessing.Manager().Queue() is fine in every case and less troublesome. There might be some drawbacks using a manager but I'm not aware of it.

like image 55
michael Avatar answered Oct 12 '22 03:10

michael