Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I have to use .wait() with python's subprocess module?

I'm running a Perl script through the subprocess module in Python on Linux. The function that runs the script is called several times with variable input.

def script_runner(variable_input):

    out_file = open('out_' + variable_input, 'wt')
    error_file = open('error_' + variable_input, 'wt')

    process = subprocess.Popen(['perl', 'script', 'options'], shell=False,
                           stdout=out_file, stderr=error_file)

However, if I run this function, say, twice, the execution of the first process will stop when the second process starts. I can get my desired behavior by adding

process.wait()

after calling the script, so I'm not really stuck. However, I want find out why I cannot run the script using subprocess as many times as I want, and have the script make these computations in parallel, without having to wait for it to finish between each run.

UPDATE

The culprit was not so exciting: the perl script used a common file that was rewritten for each execution.

However, the lesson I learned from this was that the garbage collector does not delete the process once it starts running, because this had no influence on my script once I got it sorted out.

like image 705
Viktiglemma Avatar asked Nov 12 '10 13:11

Viktiglemma


People also ask

Does subprocess run wait?

subprocess. run() is synchronous which means that the system will wait till it finishes before moving on to the next command. subprocess. Popen() does the same thing but it is asynchronous (the system will not wait for it to finish).

What is process wait Python?

The wait() method in Python is used to make a running process wait for another function to complete its execution, such as a child process, before having to return to the parent class or event.

What is difference between subprocess Popen and call?

Popen is more general than subprocess. call . Popen doesn't block, allowing you to interact with the process while it's running, or continue with other things in your Python program. The call to Popen returns a Popen object.

What is Popen in Python?

Python method popen() opens a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'. The bufsize argument has the same meaning as in open() function.


1 Answers

If you are using Unix, and wish to run many processes in the background, you could use subprocess.Popen this way:

x_fork_many.py:

import subprocess
import os
import sys
import time
import random
import gc  # This is just to test the hypothesis that garbage collection of p=Popen() causing the problem.

# This spawns many (3) children in quick succession
# and then reports as each child finishes.
if __name__=='__main__':
    N=3
    if len(sys.argv)>1:
        x=random.randint(1,10)
        print('{p} sleeping for {x} sec'.format(p=os.getpid(),x=x))
        time.sleep(x)
    else:
        for script in xrange(N): 
            args=['test.py','sleep'] 
            p = subprocess.Popen(args)
        gc.collect()
        for i in range(N):
            pid,retval=os.wait()
            print('{p} finished'.format(p=pid))

The output looks something like this:

% x_fork_many.py 
15562 sleeping for 10 sec
15563 sleeping for 5 sec
15564 sleeping for 6 sec
15563 finished
15564 finished
15562 finished

I'm not sure why you are getting the strange behavior when not calling .wait(). However, the script above suggests (at least on unix) that saving subprocess.Popen(...) processes in a list or set is not necessary. Whatever the problem is, I don't think it has to do with garbage collection.

PS. Maybe your perl scripts are conflicting in some way, which causes one to end with an error when another one is running. Have you tried starting multiple calls to the perl script from the command line?

like image 75
unutbu Avatar answered Oct 12 '22 20:10

unutbu