I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):
from multiprocessing import Pool import numpy as np import time import random rows = 16 columns = 1000000 vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns) def worker(arr): time.sleep(random.random()) # let the process sleep a random for idx in np.ndindex(arr.shape): # amount of time to ensure that arr[idx] += 1 # the processes finish at different # time steps return arr # create the threadpool with Pool(4) as p: # schedule one map/worker for each row in the original data q = p.map(worker, [row for row in vals]) for idx, row in enumerate(q): print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))
For me this always results in:
[00]: 1 - 1000000 [01]: 1000001 - 2000000 [02]: 2000001 - 3000000 [03]: 3000001 - 4000000 [04]: 4000001 - 5000000 [05]: 5000001 - 6000000 [06]: 6000001 - 7000000 [07]: 7000001 - 8000000 [08]: 8000001 - 9000000 [09]: 9000001 - 10000000 [10]: 10000001 - 11000000 [11]: 11000001 - 12000000 [12]: 12000001 - 13000000 [13]: 13000001 - 14000000 [14]: 14000001 - 15000000 [15]: 15000001 - 16000000
Question: So, does Pool
really keep the original input's order when storing the results of each map
function in q
?
Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q
) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.
Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.
The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.
Process pool can be defined as the group of pre-instantiated and idle processes, which stand ready to be given work. Creating process pool is preferred over instantiating new processes for every task when we need to do a large number of tasks.
It is the single execution of the function specified with the func -parameter of a Pool -method, called with arguments obtained from a single element of the transmitted chunk. A task consists of chunksize taskels.
The documentation bills it as a "parallel equivalent of the map()
built-in function". Since map
is guaranteed to preserve order, multiprocessing.Pool.map
makes that guarantee too.
Pool.map
results are ordered. If you need order, great; if you don't, Pool.imap_unordered
may be a useful optimization.
Note that while the order in which you receive the results from Pool.map
is fixed, the order in which they are computed is arbitrary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With