Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: does Pool keep the original order of data passed to map?

I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):

from multiprocessing import Pool import numpy as np import time import random   rows = 16 columns = 1000000  vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)  def worker(arr):     time.sleep(random.random())        # let the process sleep a random     for idx in np.ndindex(arr.shape):  # amount of time to ensure that         arr[idx] += 1                  # the processes finish at different                                        # time steps     return arr  # create the threadpool with Pool(4) as p:     # schedule one map/worker for each row in the original data     q = p.map(worker, [row for row in vals])  for idx, row in enumerate(q):     print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1])) 

For me this always results in:

[00]:        1 -  1000000 [01]:  1000001 -  2000000 [02]:  2000001 -  3000000 [03]:  3000001 -  4000000 [04]:  4000001 -  5000000 [05]:  5000001 -  6000000 [06]:  6000001 -  7000000 [07]:  7000001 -  8000000 [08]:  8000001 -  9000000 [09]:  9000001 - 10000000 [10]: 10000001 - 11000000 [11]: 11000001 - 12000000 [12]: 12000001 - 13000000 [13]: 13000001 - 14000000 [14]: 14000001 - 15000000 [15]: 15000001 - 16000000 

Question: So, does Pool really keep the original input's order when storing the results of each map function in q?

Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.

Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.

like image 384
daniel451 Avatar asked Dec 22 '16 00:12

daniel451


People also ask

How does pool map work Python?

The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.

What is process pool in Python?

Process pool can be defined as the group of pre-instantiated and idle processes, which stand ready to be given work. Creating process pool is preferred over instantiating new processes for every task when we need to do a large number of tasks.

What is Chunksize in multiprocessing?

It is the single execution of the function specified with the func -parameter of a Pool -method, called with arguments obtained from a single element of the transmitted chunk. A task consists of chunksize taskels.


2 Answers

The documentation bills it as a "parallel equivalent of the map() built-in function". Since map is guaranteed to preserve order, multiprocessing.Pool.map makes that guarantee too.

like image 25
mgilson Avatar answered Sep 20 '22 09:09

mgilson


Pool.map results are ordered. If you need order, great; if you don't, Pool.imap_unordered may be a useful optimization.

Note that while the order in which you receive the results from Pool.map is fixed, the order in which they are computed is arbitrary.

like image 91
user2357112 supports Monica Avatar answered Sep 21 '22 09:09

user2357112 supports Monica