When to call .join() on a process?

Tags:

multiprocessing

I am reading various tutorials on the multiprocessing module in Python, and am having trouble understanding why/when to call process.join(). For example, I stumbled across this example:

nums = range(100000) nprocs = 4  def worker(nums, out_q):     """ The worker function, invoked in a process. 'nums' is a         list of numbers to factor. The results are placed in         a dictionary that's pushed to a queue.     """     outdict = {}     for n in nums:         outdict[n] = factorize_naive(n)     out_q.put(outdict)  # Each process will get 'chunksize' nums and a queue to put his out # dict into out_q = Queue() chunksize = int(math.ceil(len(nums) / float(nprocs))) procs = []  for i in range(nprocs):     p = multiprocessing.Process(             target=worker,             args=(nums[chunksize * i:chunksize * (i + 1)],                   out_q))     procs.append(p)     p.start()  # Collect all results into a single result dict. We know how many dicts # with results to expect. resultdict = {} for i in range(nprocs):     resultdict.update(out_q.get())  # Wait for all worker processes to finish for p in procs:     p.join()  print resultdict

From what I understand, process.join() will block the calling process until the process whose join method was called has completed execution. I also believe that the child processes which have been started in the above code example complete execution upon completing the target function, that is, after they have pushed their results to the out_q. Lastly, I believe that out_q.get() blocks the calling process until there are results to be pulled. Thus, if you consider the code:

resultdict = {} for i in range(nprocs):     resultdict.update(out_q.get())  # Wait for all worker processes to finish for p in procs:     p.join()

the main process is blocked by the out_q.get() calls until every single worker process has finished pushing its results to the queue. Thus, by the time the main process exits the for loop, each child process should have completed execution, correct?

If that is the case, is there any reason for calling the p.join() methods at this point? Haven't all worker processes already finished, so how does that cause the main process to "wait for all worker processes to finish?" I ask mainly because I have seen this in multiple different examples, and I am curious if I have failed to understand something.

890

asked Jan 20 '13 21:01

Justin

1 Answers

Try to run this:

import math import time from multiprocessing import Queue import multiprocessing  def factorize_naive(n):     factors = []     for div in range(2, int(n**.5)+1):         while not n % div:             factors.append(div)             n //= div     if n != 1:         factors.append(n)     return factors  nums = range(100000) nprocs = 4  def worker(nums, out_q):     """ The worker function, invoked in a process. 'nums' is a         list of numbers to factor. The results are placed in         a dictionary that's pushed to a queue.     """     outdict = {}     for n in nums:         outdict[n] = factorize_naive(n)     out_q.put(outdict)  # Each process will get 'chunksize' nums and a queue to put his out # dict into out_q = Queue() chunksize = int(math.ceil(len(nums) / float(nprocs))) procs = []  for i in range(nprocs):     p = multiprocessing.Process(             target=worker,             args=(nums[chunksize * i:chunksize * (i + 1)],                   out_q))     procs.append(p)     p.start()  # Collect all results into a single result dict. We know how many dicts # with results to expect. resultdict = {} for i in range(nprocs):     resultdict.update(out_q.get())  time.sleep(5)  # Wait for all worker processes to finish for p in procs:     p.join()  print resultdict  time.sleep(15)

And open the task-manager. You should be able to see that the 4 subprocesses go in zombie state for some seconds before being terminated by the OS(due to the join calls):

enter image description here

With more complex situations the child processes could stay in zombie state forever(like the situation you was asking about in an other question), and if you create enough child-processes you could fill the process table causing troubles to the OS(which may kill your main process to avoid failures).

176

answered Oct 17 '22 00:10

Bakuriu

Related questions
                            
                                Select from pandas dataframe using boolean series/array
                            
                                Python Error: AttributeError: __enter__ [duplicate]
                            
                                What does it mean to have an index to scalar variable error? python
                            
                                Can a Python Abstract Base Class enforce function signatures?
                            
                                How to define a table without primary key with SQLAlchemy?
                            
                                How to get rid of multilevel index after using pivot table pandas?
                            
                                Is the order of results coming from a list comprehension guaranteed?
                            
                                F# vs IronPython: When is one preferred to the other?
                            
                                Is there any direct way to generate pdf from markdown file by python [closed]
                            
                                How do I use data in package_data from source code?
                            
                                How can I run a Makefile in setup.py?
                            
                                In Python is it bad to create an attribute called 'id'?
                            
                                How to get data from command line from within a Python program?
                            
                                How is an empty __init__.py file correct?
                            
                                Set variable point size in matplotlib
                            
                                PyQt on Android
                            
                                'verbose' argument in scikit-learn
                            
                                How to understand loss acc val_loss val_acc in Keras model fitting
                            
                                Interactive console using Pydev in Eclipse?
                            
                                Numpy modify array in place?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With