I'm using Python multiprocessing pool module to create a pool of process and assign jobs to it.
I have created 4 process and assigning 2 jobs but trying to display their process number but in the display I just see one process number "6952"...Shouldn't it print 2 process number
from multiprocessing import Pool
from time import sleep
def f(x):
import os
print "process id = " , os.getpid()
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.map_async(f, (11,)) #Start job 1
result1 = pool.map_async(f, (10,)) #Start job 2
print "result = ", result.get(timeout=1)
print "result1 = ", result1.get(timeout=1)
Result :--
result = process id = 6952
process id = 6952
[121]
result1 = [100]
It's just down to timing. Windows needs to spawn 4 processes in the Pool
, which then need to start up, initialize, and prepare to consume from the Queue
. On Windows, this requires each child process to re-import the __main__
module, and for the Queue
instances used internally by the Pool
to be unpickled in each child. This is takes a non-trivial amount of time. Long enough, in fact, that when you both of your map_async()
calls are executed before all the processes in the Pool
are even up and running. You can see this if you add some tracing the the function run by each worker in the Pool
:
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
print("getting {}".format(current_process()))
task = get() # This is getting the task from the parent process
print("got {}".format(current_process()))
Output:
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
result = [121]
result1 = [100]
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
As you can see, Worker-1
starts up and consumes both tasks before workers 2-4 ever try to consume from the Queue
. If you add a sleep
call after you instantiate the Pool
in the main process, but before calling map_async
, you'll see different processes handle each request:
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
# <sleeping here>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id = 5183
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
process id = 5184
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
result = [121]
result1 = [100]
got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
(Note that the extra "getting
/"got"
statements you see are sentinels being sent to each process to gracefully shut them down).
Using Python 3.x on Linux, I'm able to reproduce this behavior using the 'spawn'
and 'forkserver'
contexts, but not 'fork'
. Presumably because forking the child processes is much faster than spawning them and doing a re-import of __main__
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With