Is there a simple process-based parallel map for python?

Tags:

I'm looking for a simple process-based parallel map for python, that is, a function

parmap(function,[data])

that would run function on each element of [data] on a different process (well, on a different core, but AFAIK, the only way to run stuff on different cores in python is to start multiple interpreters), and return a list of results.

Does something like this exist? I would like something simple, so a simple module would be nice. Of course, if no such thing exists, I will settle for a big library :-/

908

asked Nov 09 '09 22:11

static_rtti

2 Answers

I seems like what you need is the map method in multiprocessing.Pool():

map(func, iterable[, chunksize])

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks till the result is ready.  This method chops the iterable into a number of chunks which it submits to the  process pool as separate tasks. The (approximate) size of these chunks can be  specified by setting chunksize to a positive integ

For example, if you wanted to map this function:

def f(x):     return x**2

to range(10), you could do it using the built-in map() function:

map(f, range(10))

or using a multiprocessing.Pool() object's method map():

import multiprocessing pool = multiprocessing.Pool() print pool.map(f, range(10))

105

answered Sep 19 '22 08:09

Flávio Amieiro

This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

To parallelize your example, you'd need to define your map function with the @ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.

import time import ray  ray.init()  # Define the function you want to apply map on, as remote function.  @ray.remote def f(x):     # Do some work...     time.sleep(1)     return x*x  # Define a helper parmap(f, list) function. # This function executes a copy of f() on each element in "list". # Each copy of f() runs in a different process. # Note f.remote(x) returns a future of its result (i.e.,  # an identifier of the result) rather than the result itself.   def parmap(f, list):     return [f.remote(x) for x in list]  # Call parmap() on a list consisting of first 5 integers. result_ids = parmap(f, range(1, 6))  # Get the results results = ray.get(result_ids) print(results)

This will print:

[1, 4, 9, 16, 25]

and it will finish in approximately len(list)/p (rounded up the nearest integer) where p is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2 rounded up, i.e, in approximately 3 sec.

There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

answered Sep 22 '22 08:09

Ion Stoica

Related questions
                            
                                Return the current user with Django Rest Framework
                            
                                Number of regex matches
                            
                                Read from a gzip file in python
                            
                                The zip() function in Python 3 [duplicate]
                            
                                How to organize a relatively large Flask application?
                            
                                TypeError: a bytes-like object is required, not 'str'
                            
                                Force Overwrite in Os.Rename
                            
                                Django required field in model form
                            
                                split python source code into multiple files?
                            
                                HTTPError 403 (Forbidden) with Django and python-social-auth connecting to Google with OAuth2
                            
                                Understanding *x ,= lst
                            
                                What is %pylab?
                            
                                Parsing time string in Python
                            
                                matplotlib bar graph black - how do I remove bar borders
                            
                                RFC 1123 Date Representation in Python?
                            
                                How do I call the Python's list while debugging?
                            
                                Roc curve and cut off point. Python
                            
                                Understanding torch.nn.Parameter
                            
                                Python 3 TypeError: must be str, not bytes with sys.stdout.write()
                            
                                What's the fastest way in Python to calculate cosine similarity given sparse matrix data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a simple process-based parallel map for python?

Tags:

python

parallel-processing

smp

static_rtti

People also ask

2 Answers

Flávio Amieiro

Ion Stoica

Recent Activity

Donate For Us