Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python sharing a dictionary between parallel processes

I want to share a dictionary between my processes as follows:

def f(y,x):
    y[x]=[x*x]                                                          

if __name__ == '__main__':
    pool = Pool(processes=4)
    inputs = range(10)
    y={}                             
    result = pool.map(f,y,inputs)

The y returns {}. How can I make it work?

Thanks,

like image 423
Amir Avatar asked Jun 13 '12 23:06

Amir


People also ask

How do I share data between two processes in Python?

Every object has two methods – send() and recv(), to communicate between processes.

How do you pass multiple arguments in multiprocessing Python?

Use Pool. The multiprocessing pool starmap() function will call the target function with multiple arguments. As such it can be used instead of the map() function. This is probably the preferred approach for executing a target function in the multiprocessing pool that takes multiple arguments.

Does Python have multiprocessing?

Python multiprocessing Pool can be used for parallel execution of a function across multiple input values, distributing the input data across processes (data parallelism).

How does Python multiprocessing queue work?

A queue is a data structure on which items can be added by a call to put() and from which items can be retrieved by a call to get(). The multiprocessing. Queue provides a first-in, first-out FIFO queue, which means that the items are retrieved from the queue in the order they were added.


1 Answers

This looks like you are using the multiprocessing module. You didn't say, and that's an important bit of information.

The .map() function on a multiprocessing.Pool() instance takes two arguments: a function, and a sequence. The function will be called with successive values from the sequence.

You can't collect values in a mutable like a dict (in the example, it's argument y) because your code will be running in multiple different processes. Writing a value to a dict in another process doesn't send that value back to the original process. But if you use Pool.map() the other processes will return the result from each function call, back to the first process. Then you can collect the values to build a dict.

Example code:

import multiprocessing as mp

def f(x):
    return (x, x*x)

if __name__ == '__main__':
    pool = mp.Pool()
    inputs = range(10)
    result = dict(pool.map(f, inputs))

result is set to: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Let's change it so that instead of computing x*x it will raise x to some power, and the power will be provided. And let's make it take a string key argument. This means that f() needs to take a tuple argument, where the tuple will be (key, x, p) and it will compute x**p.

import multiprocessing as mp

def f(tup):
    key, x, p = tup  # unpack tuple into variables
    return (key, x**p)

if __name__ == '__main__':
    pool = mp.Pool()
    inputs = range(10)
    inputs = [("1**1", 1, 1), ("2**2", 2, 2), ("2**3", 2, 3), ("3**3", 3, 3)]
    result = dict(pool.map(f, inputs))

If you have several sequences and you need to join them together to make a single sequence for the above, look into using zip() or perhaps itertools.product.

like image 156
steveha Avatar answered Oct 20 '22 10:10

steveha