I want to run many processes in parallel with ability to take stdout in any time. How should I do it? Do I need to run thread for each subprocess.Popen()
call, a what?
We can use subprocess module to create multiple child processes and they are run in parallel. First, we search the current directory and obtain a list of all the compressed files. Next, we create a list of the sequence of program arguments, each list element corresponding to each file.
The subprocess module defines one class, Popen and a few wrapper functions that use that class. The constructor for Popen takes arguments to set up the new process so the parent can communicate with it via pipes. It provides all of the functionality of the other modules and functions it replaces, and more.
wait() method is asynchronous, whereas subprocess. Popen. wait() method is implemented as a blocking busy loop; the universal_newlines parameter is not supported.
You can do it in a single thread.
Suppose you have a script that prints lines at random times:
#!/usr/bin/env python #file: child.py import os import random import sys import time for i in range(10): print("%2d %s %s" % (int(sys.argv[1]), os.getpid(), i)) sys.stdout.flush() time.sleep(random.random())
And you'd like to collect the output as soon as it becomes available, you could use select
on POSIX systems as @zigg suggested:
#!/usr/bin/env python from __future__ import print_function from select import select from subprocess import Popen, PIPE # start several subprocesses processes = [Popen(['./child.py', str(i)], stdout=PIPE, bufsize=1, close_fds=True, universal_newlines=True) for i in range(5)] # read output timeout = 0.1 # seconds while processes: # remove finished processes from the list (O(N**2)) for p in processes[:]: if p.poll() is not None: # process ended print(p.stdout.read(), end='') # read the rest p.stdout.close() processes.remove(p) # wait until there is something to read rlist = select([p.stdout for p in processes], [],[], timeout)[0] # read a line from each process that has output ready for f in rlist: print(f.readline(), end='') #NOTE: it can block
A more portable solution (that should work on Windows, Linux, OSX) can use reader threads for each process, see Non-blocking read on a subprocess.PIPE in python.
Here's os.pipe()
-based solution that works on Unix and Windows:
#!/usr/bin/env python from __future__ import print_function import io import os import sys from subprocess import Popen ON_POSIX = 'posix' in sys.builtin_module_names # create a pipe to get data input_fd, output_fd = os.pipe() # start several subprocesses processes = [Popen([sys.executable, 'child.py', str(i)], stdout=output_fd, close_fds=ON_POSIX) # close input_fd in children for i in range(5)] os.close(output_fd) # close unused end of the pipe # read output line by line as soon as it is available with io.open(input_fd, 'r', buffering=1) as file: for line in file: print(line, end='') # for p in processes: p.wait()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With