Python, Multiprocessing: what does process.join() do?

Question

import time
from multiprocessing import Process

def loop(limit):
    for i in xrange(limit):
        pass
    print i

limit = 100000000 #100 million

start = time.time()    

for i in xrange(5):
    p = Process(target=loop, args=(limit,))
    p.start()
p.join()

end = time.time()
print end - start

I tried running this code, this is the output I am getting

99999999
99999999
2.73401999474
99999999
99999999
99999999

and sometimes

99999999
99999999
3.72434902191
99999999
99999999
99999999
99999999
99999999

In this case the loop function is called 7 times instead of 5. Why this strange behaviour?

I am also confused about the role of the p.join() statement. Is it ending any one process or all of them at the same time?

Songy · Accepted Answer

The join function currently will wait for the last process you call to finish before moving onto the next section of code. If you walk through what you have done you should see why you get the "strange" output.

for i in xrange(5):
    p = Process(target=loop, args=(limit,))
    p.start()

This starts 5 new processes one after the other. These are all running at the same time. Just about at least, it is down to the scheduler to decide what process is currently being processed.

This mean you have 5 processes running now:

Process 1

Process 2

Process 3

Process 4

Process 5

p.join()

This is going to wait for p process to finish Process 5 as that was the last process to be assigned to p.

Lets now say that Process 2 finishes first followed by Process 5, which is perfectly feasible as the scheduler could give those processes more time on the CPU.

Process 1

Process 2 prints 99999999

Process 3

Process 4

Process 5 prints 99999999

The p.join() line will now move on to the next part as p Process 5 has finished.

end = time.time()
print end - start

This section prints its part and now there are 3 Processes still going on after this output.

The other Processes finish and print there 99999999.

To fix this behaviour you will need to .join() all the processes. To do this you could alter your code to this...

processes = []

for i in xrange(5):
    p = Process(target=loop, args=(limit,))
    p.start()
    processes.append(p)

for process in processes:
    process.join()

This will wait for the first process, then the second and so on. It won't matter if one process finished before anther because every process on the list must be waited on before the script continues.

nick_v1 · Answer

There are some problems with the way you are doing things, try this:

start = time.time()    
procs = []
for i in xrange(5):
    p = Process(target=loop, args=(limit,))
    p.start()
    procs.append(p)
[p.join() for p in procs]

The problem is that you are not tracking of individual processes (p variables inside the loop). You need to keep them around so you can interact with them. This update will keep them in the array and then join all of them at the end.

Output looks like this:

99999999
99999999
99999999
99999999
99999999
6.29328012466

Note that now the time it took to run is also printed at the end of the execution.

Also, I ran your code and was not able to get the loop to execute multiple times.

Python, Multiprocessing: what does process.join() do?

Tags:

python

parallel-processing

multiprocessing

python-multiprocessing

Sounak

2 Answers

Songy

nick_v1

Recent Activity

Donate For Us

Python, Multiprocessing: what does process.join() do?

Tags:

python

parallel-processing

multiprocessing

python-multiprocessing

Sounak

2 Answers

Songy

nick_v1

Related questions

Recent Activity

Donate For Us