Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

share data using Manager() in python multiprocessing module

I tried to share data when using the multiprocessing module (python 2.7, Linux), I got different results when using slightly different code:

import os
import time
from multiprocessing import Process, Manager

def editDict(d):
    d[1] = 10
    d[2] = 20
    d[3] = 30


pnum = 3
m = Manager()

1st version:

mlist = m.list()
for i in xrange(pnum):
    mdict = m.dict()
    mlist.append(mdict)
    p = Process(target=editDict,args=(mdict,))
    p.start()

time.sleep(2)
print 'after process finished', mlist

This generates:

after process finished [{1: 10, 2: 20, 3: 30}, {1: 10, 2: 20, 3: 30}, {1: 10, 2: 20, 3: 30}]

2nd version:

mlist = m.list([m.dict() for i in xrange(pnum)]) # main difference to 1st version
for i in xrange(pnum):
    p = Process(target=editDict,args=(mlist[i],))
    p.start()
time.sleep(2)
print 'after process finished', mlist

This generates:

after process finished [{}, {}, {}]

I do not understand why the outcome is so different.

like image 975
HongboZhu Avatar asked Dec 12 '11 15:12

HongboZhu


1 Answers

It is because you access the variable by the list index the second time, while the first time you pass the actual variable. As stated in the multiprocessing docs:

Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified.

This means that, to keep track of items that are changed within a container (dictionary or list), you must reassign them after each edit. Consider the following change (for explanatory purposes, I'm not claiming this to be clean code):

def editDict(d, l, i):
    d[1] = 10
    d[2] = 20
    d[3] = 30
    l[i] = d

mlist = m.list([m.dict() for i in xrange(pnum)])
for i in xrange(pnum):
    p = Process(target=editDict,args=(mlist[i], mlist, i,))
    p.start()

If you will now print mlist, you'll see that is has the same output as your first attempt. The reassignment will allow the container proxy to keep track of the updated item again.

Your main issue in this case is that you have a dict (proxy) inside a list proxy: updates to the contained container won't be noticed by the manager, and hence not have the changes you expected it to have. Note that the dictionary itself will be updated in the second example, but you just don't see it since the manager didn't sync.

like image 61
jro Avatar answered Nov 12 '22 12:11

jro