How can I make multiprocessing.pool.map distribute processes in numerical order?
More Info:
I have a program which processes a few thousand data files, making a plot of each one. I'm using a multiprocessing.pool.map
to distribute each file to a processor and it works great. Sometimes this takes a long time, and it would be nice to look at the output images as the program is running. This would be a lot easier if the map process distributed the snapshots in order; instead, for the particular run I just executed, the first 8 snapshots analyzed were: 0, 78, 156, 234, 312, 390, 468, 546
. Is there a way to make it distribute them more closely to in numerical order?
Example:
Here's a sample code which contains the same key elements, and show's the same basic result:
import sys
from multiprocessing import Pool
import time
num_proc = 4; num_calls = 20; sleeper = 0.1
def SomeFunc(arg):
time.sleep(sleeper)
print "%5d" % (arg),
sys.stdout.flush() # otherwise doesn't print properly on single line
proc_pool = Pool(num_proc)
proc_pool.map( SomeFunc, range(num_calls) )
Yields:
0 4 2 6 1 5 3 7 8 10 12 14 13 11 9 15 16 18 17 19
From @Hayden: Use the 'chunksize' parameter, def map(self, func, iterable, chunksize=None)
.
More Info:
The chunksize
determines how many iterations are allocated to each processor at a time. My example above, for instance, uses a chunksize of 2---which means that each processor goes off and does its thing for 2 iterations of the function, then comes back for more ('check-in'). The trade-off behind chunksize is that there is overhead for the 'check-in' when the processor has to sync up with the others---suggesting you want a large chunksize. On the other hand, if you have large chunks, then one processor might finish its chunk while another-one has a long time left to go---so you should use a small chunksize. I guess the additional useful information is how much range there is, in how long each function call can take. If they really should all take the same amount of time - it's way more efficient to use a large chunk size. On the other hand, if some function calls could take twice as long as others, you want a small chunksize so that processors aren't caught waiting.
For my problem, every function call should take very close to the same amount of time (I think), so if I want the processes to be called in order, I'm going to sacrifice efficiency because of the check-in overhead.
Since map is guaranteed to preserve order, multiprocessing. Pool. map makes that guarantee too.
Passing Keyword Arguments to Multiprocessing Processes We can also pass in arguments corresponding to the parameter name using the kwargs parameter in the Process class. Instead of passing a tuple, we pass a dictionary to kwargs where we specify the argument name and the variable being passed in as that argument.
The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.
The reason that this occurs is because each process is given a predefined amount of work to do at the start of the call to map which is dependant on the chunksize
. We can work out the default chunksize
by looking at the source for pool.map
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
So for a range of 20, and with 4 processes, we will get a chunksize
of 2.
If we modify your code to reflect this we should get similar results to the results you are getting now:
proc_pool.map(SomeFunc, range(num_calls), chunksize=2)
This yields the output:
0 2 6 4 1 7 5 3 8 10 12 14 9 13 15 11 16 18 17 19
Now, setting the chunksize=1
will ensure that each process within the pool will only be given one task at a time.
proc_pool.map(SomeFunc, range(num_calls), chunksize=1)
This should ensure a reasonably good numerical ordering compared to that when not specifying a chunksize. For example a chunksize of 1 yields the output:
0 1 2 3 4 5 6 7 9 10 8 11 13 12 15 14 16 17 19 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With