I am processing some ascii-data, make some operations, and then writing everything back to another file (job done by post_processing_0.main
, without returning anything). I want to parallelize the code with the multiprocessing module, see the following code snippet:
from multiprocessing import Pool
import post_processing_0
def chunks(lst,n):
return [ lst[i::n] for i in xrange(n) ]
def main():
pool = Pool(processes=proc_num)
P={}
for i in range(0,proc_num):
P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
pool.close()
pool.join()
proc_num=8
timesteps=100
list_to_do=range(0,timesteps)
split_list=chunks(list_to_do,proc_num)
main()
I read the difference between map and async, but I don t understand it very well. Is my application of multiprocessing module correct?
In this case, should I use map_async or apply_async? And why?
Edit:
I don't think this is a duplicate of the question Python multiprocessing.Pool: when to use apply, apply_async or map?. In the question, the answer focus on the order of the result that can be obtained using the two functions. Here i am asking: what is it the difference when nothing is returned?
apply_async
submits a single job to the pool. map_async
submits multiple jobs calling the same function with different arguments. The former takes a function plus argument list; the latter takes a function plus iterable (i.e. sequence) which represents the arguments. map_async
can only call unary functions (i.e. functions taking one argument).
In your case, it might be better to restructure the code slightly to put all your arguments in a single list and just call map_async
once with that list.
I would recommend map_async
for three reasons:
It's cleaner looking code. This:
pool = Pool(processes=proc_num)
async_result = pool.map_async(post_processing_0.main, split_list)
pool.close()
pool.join()
looks nicer than this:
pool = Pool(processes=proc_num)
P={}
for i in range(0,proc_num):
P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
pool.close()
pool.join()
With apply_async
, if an exception occurs inside of post_processing_0.main
, you won't know about it unless you explicitly call P['process_x'].get()
on the failing AsyncResult
object, which would require iterating over all of P
. With map_async
the exception will be raised if you call async_result.get()
- no iteration required.
map_async
has built-in chunking functionality, which will make your code perform noticeably better if split_list
is very large.
Other than that, the behavior is basically the same if you don't care about the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With