Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing of shared list

I have written a program like this:

from multiprocessing import Process, Manager

def worker(i):
    x[i].append(i)

if __name__ == '__main__':
    manager = Manager()
    x = manager.list()
    for i in range(5):
        x.append([])
    p = []
    for i in range(5):
        p.append(Process(target=worker, args=(i,)))
        p[i].start()

    for i in range(5):
        p[i].join()

    print x

I want to create a shared list of lists among processes and each process modify a list in it. But the result of this program is a list of empty lists: [[],[],[],[],[]].

What's going wrong?

like image 381
Eric Xu Avatar asked May 13 '14 05:05

Eric Xu


People also ask

Does multiprocessing shared memory?

A shared-memory multiprocessor is an architecture consisting of a modest number of processors, all of which have direct (hardware) access to all the main memory in the system (Fig. 2.17). This permits any of the system processors to access data that any of the other processors has created or will use.

What is multiprocessing pool?

The multiprocessing. pool. Pool in Python provides a pool of reusable processes for executing ad hoc tasks. A process pool can be configured when it is created, which will prepare the child workers. A process pool object which controls a pool of worker processes to which jobs can be submitted.

What is multiprocessing in Python?

Multiprocessing in Python is a built-in package that allows the system to run multiple processes simultaneously. It will enable the breaking of applications into smaller threads that can run independently.

What is spawn in multiprocessing?

In multiprocessing , processes are spawned by creating a Process object and then calling its start() method. Process follows the API of threading.Thread .


1 Answers

I think this is because of quirk in the way Managers are implemented.

If you create two Manager.list objects, and then append one of the lists to the other, the type of the list that you append changes inside the parent list:

>>> type(l)
<class 'multiprocessing.managers.ListProxy'>
>>> type(z)
<class 'multiprocessing.managers.ListProxy'>
>>> l.append(z)
>>> type(l[0])
<class 'list'>   # Not a ListProxy anymore

l[0] and z are not the same object, and don't behave quite the way you'd expect as a result:

>>> l[0].append("hi")
>>> print(z)
[]
>>> z.append("hi again")
>>> print(l[0])
['hi again']

As you can see, changing the nested list doesn't have any effect on the ListProxy object, but changing the ListProxy object does change the nested list. The documentation actually explicitly notes this:

Note

Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy:

Digging through the source code, you can see that when you call append on a ListProxy, the append call is actually sent to a manager object via IPC, and then the manager calls append on the shared list. That means that the args to append need to get pickled/unpickled. During the unpickling process, the ListProxy object gets turned into a regular Python list, which is a copy of what the ListProxy was pointing to (aka its referent). This is also noted in the documentation:

An important feature of proxy objects is that they are picklable so they can be passed between processes. Note, however, that if a proxy is sent to the corresponding manager’s process then unpickling it will produce the referent itself. This means, for example, that one shared object can contain a second

So, going back to the example above, if l[0] is a copy of z, why does updating z also update l[0]? Because the copy also gets registered with the Proxy object, so, that when you change the ListProxy (z in the example above), it also updates all the registered copies of the list (l[0] in the example above). However, the copy knows nothing about the proxy, so when you change the copy, the Proxy doesn't change.

So, in order to make your example work, you need to create a new manager.list() object every time you want to modify a sublist, and only update that proxy object directly, rather than updating it via the index of the parent list:

#!/usr/bin/python

from multiprocessing import Process, Manager

def worker(x, i, *args):
    sub_l = manager.list(x[i])
    sub_l.append(i)
    x[i] = sub_l


if __name__ == '__main__':
    manager = Manager()
    x = manager.list([[]]*5)
    print x
    p = []
    for i in range(5):
        p.append(Process(target=worker, args=(x, i)))
        p[i].start()

    for i in range(5):
        p[i].join()

    print x

Here's the output:

dan@dantop2:~$ ./multi_weirdness.py 
[[0], [1], [2], [3], [4]]
like image 133
dano Avatar answered Sep 20 '22 12:09

dano