Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing process number

I'm using Python multiprocessing pool module to create a pool of process and assign jobs to it.

I have created 4 process and assigning 2 jobs but trying to display their process number but in the display I just see one process number "6952"...Shouldn't it print 2 process number

from multiprocessing import Pool
from time import sleep

def f(x):
    import os 
    print "process id = " , os.getpid()
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes

    result  =  pool.map_async(f, (11,))   #Start job 1 
    result1 =  pool.map_async(f, (10,))   #Start job 2
    print "result = ", result.get(timeout=1)  
    print "result1 = ", result1.get(timeout=1)

Result :--

result = process id =  6952
process id =  6952
 [121]
result1 =  [100]
like image 804
user1050619 Avatar asked Nov 01 '14 22:11

user1050619


1 Answers

It's just down to timing. Windows needs to spawn 4 processes in the Pool, which then need to start up, initialize, and prepare to consume from the Queue. On Windows, this requires each child process to re-import the __main__ module, and for the Queue instances used internally by the Pool to be unpickled in each child. This is takes a non-trivial amount of time. Long enough, in fact, that when you both of your map_async() calls are executed before all the processes in the Pool are even up and running. You can see this if you add some tracing the the function run by each worker in the Pool:

while maxtasks is None or (maxtasks and completed < maxtasks):
    try:
        print("getting {}".format(current_process()))
        task = get()  # This is getting the task from the parent process
        print("got {}".format(current_process()))

Output:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
result =  [121]
result1 =  [100]
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>

As you can see, Worker-1 starts up and consumes both tasks before workers 2-4 ever try to consume from the Queue. If you add a sleep call after you instantiate the Pool in the main process, but before calling map_async, you'll see different processes handle each request:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
# <sleeping here>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5183
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
process id =  5184
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
result =  [121]
result1 =  [100]
got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>

(Note that the extra "getting/"got" statements you see are sentinels being sent to each process to gracefully shut them down).

Using Python 3.x on Linux, I'm able to reproduce this behavior using the 'spawn' and 'forkserver' contexts, but not 'fork'. Presumably because forking the child processes is much faster than spawning them and doing a re-import of __main__.

like image 88
dano Avatar answered Nov 10 '22 05:11

dano