I have written a python program using multiprocessing. The program invokes a 8 worker which outputs a random number after sleeping 3 second. I expect the program finishes in 3 seconds but it finishes in 24 seconds, as if each worker function is evaluated sequentially rather than in parallel. Any idea?
import time
import numpy as np
import multiprocessing as mp
import time
import sys
def f(i):
np.random.seed(int(time.time()+i))
time.sleep(3)
res=np.random.rand()
print "From i = ",i, " res = ",res
if __name__=='__main__':
num_workers=mp.cpu_count() # My CPu has 8 cores.
pool=mp.Pool(num_workers)
for i in range(num_workers):
p=pool.apply_async(f, args=(i,))
p.get()
pool.close()
pool.join()
However, if I use Process instead of Pool, I get the right results as expected:
import time
import numpy as np
import multiprocessing as mp
import time
import sys
def f(i):
np.random.seed(int(time.time()+i))
time.sleep(3)
res=np.random.rand()
print "From i = ",i, " res = ",res
if res>0.7:
print "find it"
if __name__=='__main__':
num_workers=mp.cpu_count()
pool=mp.Pool(num_workers)
for i in range(num_workers):
p=mp.Process(target=f,args=(i,))
p.start()
Think about what you're doing here:
for i in range(num_workers):
p=pool.apply_async(f, args=(i,))
p.get()
Each time through the loop, you send some work off to a pool process, and then (via .get()
) you explicitly wait for that process to return its result. So of course nothing much happens in parallel.
The usual way to do this is more like:
workers = [pool.apply_async(f, args=(i,)) for i in range(num_workers)]
for w in workers:
w.get()
That is, you start as many workers going as you want before you wait for any of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With