Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I access a shared dictionary with multiprocessing?

Tags:

python

I think I am following the python documentation correctly but I am having trouble getting the result I am looking for. I basically have a list of numbers, that are being passed to a function of nested for loops and the output is saved in a dictionary.

Here's the code:

from multiprocessing import Pool, Manager

list = [1,2,3,10]
dictionary = {}
def test(x, dictionary):
    for xx in range(100):
        for xxx in range(100):
            dictionary[x]=xx*xxx



if __name__ == '__main__':
    pool = Pool(processes=4)
    mgr = Manager()
    d = mgr.dict()
    for N in list:
        pool.apply_async(test, (N, d))

    # Mark pool as closed -- no more tasks can be added.
    pool.close()

    # Wait for tasks to exit
    pool.join()

    # Output results
    print d

Here's the expected result:

{1: 9801, 2: 9801, 3: 9801, 10: 9801}

Any suggestions of what I'm doing wrong? Also, I haven't convinced myself that shared resources are the best approach(thinking of using a database to maintain state) so if my approach is completely flawed or there's a better way to do this in python please let me know.

like image 293
Lostsoul Avatar asked Feb 13 '12 05:02

Lostsoul


People also ask

Is memory shared in multiprocessing?

shared_memory — Shared memory for direct access across processes. New in version 3.8. This module provides a class, SharedMemory , for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine.

How data is shared in multiprocessing in Python?

multiprocessing provides two methods of doing this: one using shared memory (suitable for simple values, arrays, or ctypes) or a Manager proxy, where one process holds the memory and a manager arbitrates access to it from other processes (even over a network).

How do you do multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

What is multiprocessing dummy?

multiprocessing. dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module. That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs.


1 Answers

Change the definition of test to:

def test(x, d):
    for xx in range(100):
        for xxx in range(100):
            d[x]=xx*xxx

Otherwise you're just incrementing some global dictionary (without synchronization) and never access it later.


As for the general approach, I think this one in particular has a lot of contention on the shared dictionary. Do you really have to update it from each process as soon as that? Accumulating batches of partial results in each process and just updating the shared object once in a while should perform better.

like image 196
Eli Bendersky Avatar answered Oct 20 '22 19:10

Eli Bendersky