Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Script using multiprocessing module does not terminate

The following code, does not print "here". What is the problem? I tested it on both my machines (windows 7, Ubuntu 12.10), and http://www.compileonline.com/execute_python_online.php It does not print "here" in all cases.

from multiprocessing import Queue, Process


def runLang(que):
    print "start"
    myDict=dict()
    for i in xrange(10000):
        myDict[i]=i
    que.put(myDict)
    print "finish"


def run(fileToAnalyze):
    que=Queue()
    processList=[]
    dicList=[]
    langs= ["chi","eng"]
    for lang in langs:
        p=Process(target=runLang,args=(que,))
        processList.append(p)
        p.start()

    for p1 in processList:
        p1.join()

    print "here"

    for _ in xrange(len(langs)):
        item=que.get()
        print item
        dicList.append(item)

if __name__=="__main__":
    processList = []
    for fileToAnalyse in ["abc.txt","def.txt"]:
        p=Process(target=run,args=(fileToAnalyse,))
        processList.append(p)
        p.start()
    for p1 in processList:
        p1.join()
like image 565
william007 Avatar asked Nov 04 '14 15:11

william007


People also ask

How do you terminate a process in multiprocessing?

A process can be killed by calling the Process. terminate() function. The call will only terminate the target process, not child processes. The method is called on the multiprocessing.

How does multiprocess work in Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

How does multiprocessing lock work in Python?

Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.

How do processes pools work in multiprocessing?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.


1 Answers

This is because when you put lots of items into a multiprocessing.Queue, they eventually get buffered in memory, once the underlying Pipe is full. The buffer won't get flushed until something starts reading from the other end of the Queue, which will allow the Pipe to accept more data. A Process cannot terminate until the buffer for all its Queue instances have been entirely flushed to their underlying Pipe. The implication of this is that if you try to join a process without having another process/thread calling get on its Queue, you could deadlock. This is mentioned in the docs:

Warning

As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

Note that a queue created using a manager does not have this issue.

You can fix the issue by not calling join until after you empty the Queue in the parent:

for _ in xrange(len(langs)):
    item = que.get()
    print(item)
    dicList.append(item)

# join after emptying the queue.
for p in processList:
    p.join()

print("here")
like image 168
dano Avatar answered Nov 02 '22 05:11

dano