I have a python script to run a few external commands using the os.subprocess module. But one of these steps takes a huge time and so I would like to run it separately. I need to launch them, check they are finished and then execute the next command which is not parallel. My code is something like this:
nproc = 24
for i in xrange(nproc):
#Run program in parallel
#Combine files generated by the parallel step
for i in xrange(nproc):
handle = open('Niben_%s_structures' % (zfile_name), 'w')
for i in xrange(nproc):
for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
handle.close()
#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)
If we want to execute the function one at a time, we can use the submit() method. It schedules the target function for execution and returns a futures object. This futures object encapsulates the function's execution and allows us to check that it's running or if it's done and fetch the return value using result() .
One way to achieve parallelism in Python is by using the multiprocessing module. The multiprocessing module allows you to create multiple processes, each of them with its own Python interpreter. For this reason, Python multiprocessing accomplishes process-based parallelism.
Process-Based Parallelism With this approach, it is possible to start several processes at the same time (concurrently). This way, they can concurrently perform calculations. Starting from Python 3, the multiprocessing package is preinstalled and gives us a convenient syntax for launching concurrent processes.
For your example, you just want to shell out in parallel - you don't need threads for that.
Use the Popen
constructor in the subprocess
module: http://docs.python.org/library/subprocess.htm
Collect the Popen
instances for each process you spawned and then wait()
for them to finish:
procs = []
for i in xrange(nproc):
procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
p.wait()
You can get away with this (as opposed to using the multiprocessing
or threading
modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With