Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, running command line tools in parallel

I am using Python as a script language to do some data processing and call command-line tools for number crunching. I wish to run command-line tools in parallel since they are independent with each other. When one command-line tool is finished, I can collect its results from the output file. So I also need some synchronization mechanism to notify my main Python program that one task is finished so that the result could be parsed into my main program.

Currently, I use os.system(), which works fine for one-thread, but cannot be parallelized.

Thanks!

like image 749
Yin Zhu Avatar asked Mar 04 '12 11:03

Yin Zhu


1 Answers

If you want to run commandline tools as separate processes, just use os.system (or better: The subprocess module) to start them asynchronously. On Unix/linux/macos:

subprocess.call("command -flags arguments &", shell=True)

On Windows:

subprocess.call("start command -flags arguments", shell=True)

As for knowing when a command has finished: Under unix you could get set up with wait etc., but if you're writing the commandline scripts, I'd just have them write a message into a file, and monitor the file from the calling python script.

@James Youngman proposed a solution to your second question: Synchronization. If you want to control your processes from python, you could start them asynchronously with Popen.

p1 = subprocess.Popen("command1 -flags arguments")
p2 = subprocess.Popen("command2 -flags arguments")

Beware that if you use Popen and your processes write a lot of data to stdout, your program will deadlock. Be sure to redirect all output to a log file.

p1 and p2 are objects that you can use to keep tabs on your processes. p1.poll() will not block, but will return None if the process is still running. It will return the exit status when it is done, so you can check if it is zero.

while True:
    time.sleep(60)
    for proc in [p1, p2]:
        status = proc.poll()
        if status == None:
            continue
        elif status == 0:
            # harvest the answers
        else:
            print "command1 failed with status", status

The above is just a model: As written, it will never exit, and it will keep "harvesting" the results of completed processes. But I trust you get the idea.

like image 131
alexis Avatar answered Oct 08 '22 00:10

alexis