Python 3: does Pool keep the original order of data passed to map?

Tags:

I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):

from multiprocessing import Pool import numpy as np import time import random   rows = 16 columns = 1000000  vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)  def worker(arr):     time.sleep(random.random())        # let the process sleep a random     for idx in np.ndindex(arr.shape):  # amount of time to ensure that         arr[idx] += 1                  # the processes finish at different                                        # time steps     return arr  # create the threadpool with Pool(4) as p:     # schedule one map/worker for each row in the original data     q = p.map(worker, [row for row in vals])  for idx, row in enumerate(q):     print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))

For me this always results in:

[00]:        1 -  1000000 [01]:  1000001 -  2000000 [02]:  2000001 -  3000000 [03]:  3000001 -  4000000 [04]:  4000001 -  5000000 [05]:  5000001 -  6000000 [06]:  6000001 -  7000000 [07]:  7000001 -  8000000 [08]:  8000001 -  9000000 [09]:  9000001 - 10000000 [10]: 10000001 - 11000000 [11]: 11000001 - 12000000 [12]: 12000001 - 13000000 [13]: 13000001 - 14000000 [14]: 14000001 - 15000000 [15]: 15000001 - 16000000

Question: So, does Pool really keep the original input's order when storing the results of each map function in q?

Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.

Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.

384

asked Dec 22 '16 00:12

daniel451

2 Answers

The documentation bills it as a "parallel equivalent of the map() built-in function". Since map is guaranteed to preserve order, multiprocessing.Pool.map makes that guarantee too.

answered Sep 20 '22 09:09

mgilson

Pool.map results are ordered. If you need order, great; if you don't, Pool.imap_unordered may be a useful optimization.

Note that while the order in which you receive the results from Pool.map is fixed, the order in which they are computed is arbitrary.

answered Sep 21 '22 09:09

user2357112 supports Monica

Related questions
                            
                                What might be the cause of 'invalid value encountered in less_equal' in numpy
                            
                                How to suppress a third-party warning using warnings.filterwarnings
                            
                                How to silence "sys.excepthook is missing" error?
                            
                                Python unittest - setUpClass() is giving me trouble - why can't I inherit like this?
                            
                                Compiling Python
                            
                                Non-ASCII characters in Matplotlib
                            
                                How to get a row-by-row MySQL ResultSet in python
                            
                                Which database engine to choose for Django app? [closed]
                            
                                ValueError: zero length field name in format in Python2.6.6
                            
                                Renaming a file in PyCharm
                            
                                How can I serialize a numpy array while preserving matrix dimensions?
                            
                                Managing connection to redis from Python
                            
                                How to change default install location for pip
                            
                                Which Python API should be used with Mongo DB and Django
                            
                                How to mock python's datetime.now() in a class method for unit testing?
                            
                                ImportError: No module named 'Cython' [duplicate]
                            
                                Python - is there a "don't care" symbol for tuple assignments?
                            
                                zip(list1, list2) in Jinja2?
                            
                                Reusing code from different IPython notebooks
                            
                                Obtain eigen values and vectors from sklearn PCA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python 3: does Pool keep the original order of data passed to map?

Tags:

python

python-3.x

multithreading

multiprocessing

threadpool

daniel451

People also ask

2 Answers

mgilson

user2357112 supports Monica

Recent Activity

Donate For Us