Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple process-based parallel map for python?

I'm looking for a simple process-based parallel map for python, that is, a function

parmap(function,[data]) 

that would run function on each element of [data] on a different process (well, on a different core, but AFAIK, the only way to run stuff on different cores in python is to start multiple interpreters), and return a list of results.

Does something like this exist? I would like something simple, so a simple module would be nice. Of course, if no such thing exists, I will settle for a big library :-/

like image 908
static_rtti Avatar asked Nov 09 '09 22:11

static_rtti


People also ask

Does Python map work in parallel?

This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code. To parallelize your example, you'd need to define your map function with the @ray. remote decorator, and then invoke it with . remote .

Is Python good for parallel processing?

Parallel processing can increase the number of tasks done by your program which reduces the overall processing time. These help to handle large scale problems.

How do you process a parallel loop in Python?

To parallelize the loop, we can use the multiprocessing package in Python as it supports creating a child process by the request of another ongoing process. The multiprocessing module could be used instead of the for loop to execute operations on every element of the iterable. It's multiprocessing.

How do I run multiple processes in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.


2 Answers

I seems like what you need is the map method in multiprocessing.Pool():

map(func, iterable[, chunksize])

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks till the result is ready.  This method chops the iterable into a number of chunks which it submits to the  process pool as separate tasks. The (approximate) size of these chunks can be  specified by setting chunksize to a positive integ 

For example, if you wanted to map this function:

def f(x):     return x**2 

to range(10), you could do it using the built-in map() function:

map(f, range(10)) 

or using a multiprocessing.Pool() object's method map():

import multiprocessing pool = multiprocessing.Pool() print pool.map(f, range(10)) 
like image 105
Flávio Amieiro Avatar answered Sep 19 '22 08:09

Flávio Amieiro


This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

To parallelize your example, you'd need to define your map function with the @ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.

import time import ray  ray.init()  # Define the function you want to apply map on, as remote function.  @ray.remote def f(x):     # Do some work...     time.sleep(1)     return x*x  # Define a helper parmap(f, list) function. # This function executes a copy of f() on each element in "list". # Each copy of f() runs in a different process. # Note f.remote(x) returns a future of its result (i.e.,  # an identifier of the result) rather than the result itself.   def parmap(f, list):     return [f.remote(x) for x in list]  # Call parmap() on a list consisting of first 5 integers. result_ids = parmap(f, range(1, 6))  # Get the results results = ray.get(result_ids) print(results) 

This will print:

[1, 4, 9, 16, 25] 

and it will finish in approximately len(list)/p (rounded up the nearest integer) where p is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2 rounded up, i.e, in approximately 3 sec.

There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

like image 34
Ion Stoica Avatar answered Sep 22 '22 08:09

Ion Stoica