Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a multiprocessing.Manager()?

I have a concern about multiprocessing.Manager() in python. Here is the example:

import multiprocessing  def f(ns):     ns.x *=10     ns.y *= 10  if __name__ == '__main__':     manager = multiprocessing.Manager()     ns = manager.Namespace()     ns.x = 1     ns.y = 2      print 'before', ns     p = multiprocessing.Process(target=f, args=(ns,))     p.start()     p.join()     print 'after', ns 

and the output is:

before Namespace(x=1, y=2) after Namespace(x=10, y=20) 

Until now, it worked as I expected, then I modified the code like this:

import multiprocessing  def f(ns):     ns.x.append(10)     ns.y.append(10)  if __name__ == '__main__':     manager = multiprocessing.Manager()     ns = manager.Namespace()     ns.x = []     ns.y = []      print 'before', ns     p = multiprocessing.Process(target=f, args=(ns,))     p.start()     p.join()     print 'after', ns 

Now the output is:

before Namespace(x=[], y=[]) after Namespace(x=[], y=[]) 

It confuses me why the lists were not changed as I expected. Can anyone help me to figure out what happened?

like image 886
user1231470 Avatar asked Feb 24 '12 19:02

user1231470


People also ask

What does multiprocessing manager do?

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

How do you use multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

How does multiprocessing pool work?

It works like a map-reduce architecture. It maps the input to the different processors and collects the output from all the processors. After the execution of code, it returns the output in form of a list or array. It waits for all the tasks to finish and then returns the output.

When would you use a multiprocessing pool?

Understand multiprocessing in no more than 6 minutes Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly. Executing a process on a single core confines its capability, which could otherwise spread its tentacles across multiple cores.


1 Answers

Manager proxy objects are unable to propagate changes made to (unmanaged) mutable objects inside a container. So in other words, if you have a manager.list() object, any changes to the managed list itself are propagated to all the other processes. But if you have a normal Python list inside that list, any changes to the inner list are not propagated, because the manager has no way of detecting the change.

In order to propagate the changes, you have to use manager.list() objects for the nested lists too (requires Python 3.6 or newer), or you need to modify the manager.list() object directly (see the note on manager.list in Python 3.5 or older).

For example, consider the following code and its output:

import multiprocessing import time  def f(ns, ls, di):     ns.x += 1     ns.y[0] += 1     ns_z = ns.z     ns_z[0] += 1     ns.z = ns_z      ls[0] += 1     ls[1][0] += 1 # unmanaged, not assigned back     ls_2 = ls[2]  # unmanaged...     ls_2[0] += 1     ls[2] = ls_2  # ... but assigned back     ls[3][0] += 1 # managed, direct manipulation      di[0] += 1     di[1][0] += 1 # unmanaged, not assigned back     di_2 = di[2]  # unmanaged...     di_2[0] += 1     di[2] = di_2  # ... but assigned back     di[3][0] += 1 # managed, direct manipulation  if __name__ == '__main__':     manager = multiprocessing.Manager()     ns = manager.Namespace()     ns.x = 1     ns.y = [1]     ns.z = [1]     ls = manager.list([1, [1], [1], manager.list([1])])     di = manager.dict({0: 1, 1: [1], 2: [1], 3: manager.list([1])})      print('before', ns, ls, ls[2], di, di[2], sep='\n')     p = multiprocessing.Process(target=f, args=(ns, ls, di))     p.start()     p.join()     print('after', ns, ls, ls[2], di, di[2], sep='\n') 

Output:

before Namespace(x=1, y=[1], z=[1]) [1, [1], [1], <ListProxy object, typeid 'list' at 0x10b8c4630>] [1] {0: 1, 1: [1], 2: [1], 3: <ListProxy object, typeid 'list' at 0x10b8c4978>} [1] after Namespace(x=2, y=[1], z=[2]) [2, [1], [2], <ListProxy object, typeid 'list' at 0x10b8c4630>] [2] {0: 2, 1: [1], 2: [2], 3: <ListProxy object, typeid 'list' at 0x10b8c4978>} [2] 

As you can see, when a new value is assigned directly to the managed container, it changes; when it is assigned to a mutable container within the managed container, it doesn't change; but if the mutable container is then reassigned to the managed container, it changes again. Using a nested managed container also works, detecting changes directly without having to assign back to the parent container.

like image 122
senderle Avatar answered Sep 30 '22 13:09

senderle