Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is multiprocessing.Manager().dict().setdefault() broken?

The its-late-and-im-probably-stupid department presents:

>>> import multiprocessing
>>> mgr = multiprocessing.Manager()
>>> d = mgr.dict()
>>> d.setdefault('foo', []).append({'bar': 'baz'})
>>> print d.items()
[('foo', [])]         <-- Where did the dict go?

Whereas:

>>> e = mgr.dict()
>>> e['foo'] = [{'bar': 'baz'}]
>>> print e.items()
[('foo', [{'bar': 'baz'}])]

Version:

>>> sys.version
'2.7.2+ (default, Jan 20 2012, 23:05:38) \n[GCC 4.6.2]'

Bug or wug?

EDIT: More of the same, on python 3.2:

>>> sys.version
'3.2.2rc1 (default, Aug 14 2011, 21:09:07) \n[GCC 4.6.1]'

>>> e['foo'] = [{'bar': 'baz'}]
>>> print(e.items())
[('foo', [{'bar': 'baz'}])]

>>> id(type(e['foo']))
137341152
>>> id(type([]))
137341152

>>> e['foo'].append({'asdf': 'fdsa'})
>>> print(e.items())
[('foo', [{'bar': 'baz'}])]

How can the list in the dict proxy not contain the additional element?

like image 752
Bittrance Avatar asked May 29 '12 22:05

Bittrance


1 Answers

This is some pretty interesting behavior, I am not exactly sure how it works but I'll take a crack at why the behavior is the way it is.

First, note that multiprocessing.Manager().dict() is not a dict, it is a DictProxy object:

>>> d = multiprocessing.Manager().dict()
>>> d
<DictProxy object, typeid 'dict' at 0x7fa2bbe8ea50>

The purpose of the DictProxy class is to give you a dict that is safe to share across processes, which means that it must implement some locking on top of the normal dict functions.

Apparently part of the implementation here is to not allow you to directly access mutable objects nested inside of a DictProxy, because if that was allowed you would be able to modify your shared object in a way that bypasses all of the locking that makes DictProxy safe to use.

Here is some evidence that you can't access mutable objects, which is similar to what is going on with setdefault():

>>> d['foo'] = []
>>> foo = d['foo']
>>> id(d['foo'])
140336914055536
>>> id(foo)
140336914056184

With a normal dictionary you would expect d['foo'] and foo to point to the same list object, and modifications to one would modify the other. As you have seen, this is not the case for the DictProxy class because of the additional process safety requirement imposed by the multiprocessing module.

edit: The following note from the multiprocessing documentation clarifies what I was trying to say above:


Note: Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy:

# create a list proxy and append a mutable object (a dictionary)
lproxy = manager.list()
lproxy.append({})
# now mutate the dictionary
d = lproxy[0]
d['a'] = 1
d['b'] = 2
# at this point, the changes to d are not yet synced, but by
# reassigning the dictionary, the proxy is notified of the change
lproxy[0] = d

Based on the above information, here is how you could rewrite your original code to work with a DictProxy:

# d.setdefault('foo', []).append({'bar': 'baz'})
d['foo'] = d.get('foo', []) + [{'bar': 'baz'}]

As Edward Loper suggested in comments, edited above code to use get() instead of setdefault().

like image 129
Andrew Clark Avatar answered Sep 25 '22 23:09

Andrew Clark