The its-late-and-im-probably-stupid department presents:
>>> import multiprocessing
>>> mgr = multiprocessing.Manager()
>>> d = mgr.dict()
>>> d.setdefault('foo', []).append({'bar': 'baz'})
>>> print d.items()
[('foo', [])] <-- Where did the dict go?
Whereas:
>>> e = mgr.dict()
>>> e['foo'] = [{'bar': 'baz'}]
>>> print e.items()
[('foo', [{'bar': 'baz'}])]
Version:
>>> sys.version
'2.7.2+ (default, Jan 20 2012, 23:05:38) \n[GCC 4.6.2]'
Bug or wug?
EDIT: More of the same, on python 3.2:
>>> sys.version
'3.2.2rc1 (default, Aug 14 2011, 21:09:07) \n[GCC 4.6.1]'
>>> e['foo'] = [{'bar': 'baz'}]
>>> print(e.items())
[('foo', [{'bar': 'baz'}])]
>>> id(type(e['foo']))
137341152
>>> id(type([]))
137341152
>>> e['foo'].append({'asdf': 'fdsa'})
>>> print(e.items())
[('foo', [{'bar': 'baz'}])]
How can the list in the dict proxy not contain the additional element?
This is some pretty interesting behavior, I am not exactly sure how it works but I'll take a crack at why the behavior is the way it is.
First, note that multiprocessing.Manager().dict()
is not a dict
, it is a DictProxy
object:
>>> d = multiprocessing.Manager().dict()
>>> d
<DictProxy object, typeid 'dict' at 0x7fa2bbe8ea50>
The purpose of the DictProxy
class is to give you a dict
that is safe to share across processes, which means that it must implement some locking on top of the normal dict
functions.
Apparently part of the implementation here is to not allow you to directly access mutable objects nested inside of a DictProxy
, because if that was allowed you would be able to modify your shared object in a way that bypasses all of the locking that makes DictProxy
safe to use.
Here is some evidence that you can't access mutable objects, which is similar to what is going on with setdefault()
:
>>> d['foo'] = []
>>> foo = d['foo']
>>> id(d['foo'])
140336914055536
>>> id(foo)
140336914056184
With a normal dictionary you would expect d['foo']
and foo
to point to the same list object, and modifications to one would modify the other. As you have seen, this is not the case for the DictProxy
class because of the additional process safety requirement imposed by the multiprocessing module.
edit: The following note from the multiprocessing documentation clarifies what I was trying to say above:
Note: Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy:
# create a list proxy and append a mutable object (a dictionary)
lproxy = manager.list()
lproxy.append({})
# now mutate the dictionary
d = lproxy[0]
d['a'] = 1
d['b'] = 2
# at this point, the changes to d are not yet synced, but by
# reassigning the dictionary, the proxy is notified of the change
lproxy[0] = d
Based on the above information, here is how you could rewrite your original code to work with a DictProxy
:
# d.setdefault('foo', []).append({'bar': 'baz'})
d['foo'] = d.get('foo', []) + [{'bar': 'baz'}]
As Edward Loper suggested in comments, edited above code to use get()
instead of setdefault()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With