Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using defaultdict with multiprocessing?

Just experimenting and learning, and I know how to create a shared dictionary that can be accessed with multiple proceses but I'm not sure how to keep the dict synced. defaultdict, I believe, illustrates the problem I'm having.

from collections import defaultdict
from multiprocessing import Pool, Manager, Process

#test without multiprocessing
s = 'mississippi'
d = defaultdict(int)
for k in s:
    d[k] += 1

print d.items() # Success! result: [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
print '*'*10, ' with multiprocessing ', '*'*10

def test(k, multi_dict):
    multi_dict[k] += 1

if __name__ == '__main__':
    pool = Pool(processes=4)
    mgr = Manager()
    multi_d = mgr.dict()
    for k in s:
        pool.apply_async(test, (k, multi_d))

    # Mark pool as closed -- no more tasks can be added.
    pool.close()

    # Wait for tasks to exit
    pool.join()

    # Output results
    print multi_d.items()  #FAIL

print '*'*10, ' with multiprocessing and process module like on python site example', '*'*10
def test2(k, multi_dict2):
    multi_dict2[k] += 1


if __name__ == '__main__':
    manager = Manager()

    multi_d2 = manager.dict()
    for k in s:
        p = Process(target=test2, args=(k, multi_d2))
    p.start()
    p.join()

    print multi_d2 #FAIL

The first result works(because its not using multiprocessing), but I'm having problems getting it to work with multiprocessing. I'm not sure how to solve it but I think there might be due to it not being synced(and joining the results later) or maybe because within multiprocessing I cannot figure how to set defaultdict(int) to the dictionary.

Any help or suggestions on how to get this to work would be great!

like image 785
Lostsoul Avatar asked Feb 13 '12 07:02

Lostsoul


People also ask

When would you use a Defaultdict?

The Python defaultdict type behaves almost exactly like a regular Python dictionary, but if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it. This makes defaultdict a valuable option for handling missing keys in dictionaries.

Why should I use Defaultdict?

defaultdict fills a specialized use-case, where you want values to be auto-created for you when a key is missing, and those values can be generated with a factory function without access to key being inserted. Note that exceptions are not just there to flag programmer error.

How does Defaultdict work How does Defaultdict work?

A defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key. A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

What is Lambda Defaultdict?

defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained. lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int) , which will do the same thing.


2 Answers

You can subclass BaseManager and register additional types for sharing. You need to provide a suitable proxy type in cases where the default AutoProxy-generated type does not work. For defaultdict, if you only need to access the attributes that are already present in dict, you can use DictProxy.

from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict

class MyManager(BaseManager):
    pass

MyManager.register('defaultdict', defaultdict, DictProxy)

def test(k, multi_dict):
    multi_dict[k] += 1

if __name__ == '__main__':
    pool = Pool(processes=4)
    mgr = MyManager()
    mgr.start()
    multi_d = mgr.defaultdict(int)
    for k in 'mississippi':
        pool.apply_async(test, (k, multi_d))
    pool.close()
    pool.join()
    print multi_d.items()
like image 173
Janne Karila Avatar answered Sep 21 '22 17:09

Janne Karila


Well, the Manager class seems to supply only a fixed number of predefined data structures which can be shared among processes, and defaultdict is not among them. If you really just need that one defaultdict, the easiest solution would be to implement the defaulting behavior on your own:

def test(k, multi_dict):
    if k not in multi_dict:
        multi_dict[k] = 0
    multi_dict[k] += 1
like image 27
Simon Avatar answered Sep 21 '22 17:09

Simon