Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python subprocess in parallel

Tags:

I want to run many processes in parallel with ability to take stdout in any time. How should I do it? Do I need to run thread for each subprocess.Popen() call, a what?

like image 436
sashab Avatar asked Mar 16 '12 20:03

sashab


People also ask

Does Python subprocess run parallel?

We can use subprocess module to create multiple child processes and they are run in parallel. First, we search the current directory and obtain a list of all the compressed files. Next, we create a list of the sequence of program arguments, each list element corresponding to each file.

What is subprocess Popen in Python?

The subprocess module defines one class, Popen and a few wrapper functions that use that class. The constructor for Popen takes arguments to set up the new process so the parent can communicate with it via pipes. It provides all of the functionality of the other modules and functions it replaces, and more.

Is subprocess asynchronous Python?

wait() method is asynchronous, whereas subprocess. Popen. wait() method is implemented as a blocking busy loop; the universal_newlines parameter is not supported.


1 Answers

You can do it in a single thread.

Suppose you have a script that prints lines at random times:

#!/usr/bin/env python #file: child.py import os import random import sys import time  for i in range(10):     print("%2d %s %s" % (int(sys.argv[1]), os.getpid(), i))     sys.stdout.flush()     time.sleep(random.random()) 

And you'd like to collect the output as soon as it becomes available, you could use select on POSIX systems as @zigg suggested:

#!/usr/bin/env python from __future__ import print_function from select     import select from subprocess import Popen, PIPE  # start several subprocesses processes = [Popen(['./child.py', str(i)], stdout=PIPE,                    bufsize=1, close_fds=True,                    universal_newlines=True)              for i in range(5)]  # read output timeout = 0.1 # seconds while processes:     # remove finished processes from the list (O(N**2))     for p in processes[:]:         if p.poll() is not None: # process ended             print(p.stdout.read(), end='') # read the rest             p.stdout.close()             processes.remove(p)      # wait until there is something to read     rlist = select([p.stdout for p in processes], [],[], timeout)[0]      # read a line from each process that has output ready     for f in rlist:         print(f.readline(), end='') #NOTE: it can block 

A more portable solution (that should work on Windows, Linux, OSX) can use reader threads for each process, see Non-blocking read on a subprocess.PIPE in python.

Here's os.pipe()-based solution that works on Unix and Windows:

#!/usr/bin/env python from __future__ import print_function import io import os import sys from subprocess import Popen  ON_POSIX = 'posix' in sys.builtin_module_names  # create a pipe to get data input_fd, output_fd = os.pipe()  # start several subprocesses processes = [Popen([sys.executable, 'child.py', str(i)], stdout=output_fd,                    close_fds=ON_POSIX) # close input_fd in children              for i in range(5)] os.close(output_fd) # close unused end of the pipe  # read output line by line as soon as it is available with io.open(input_fd, 'r', buffering=1) as file:     for line in file:         print(line, end='') # for p in processes:     p.wait() 
like image 76
jfs Avatar answered Oct 14 '22 13:10

jfs