Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run parallel programs in python

Tags:

python

I have a python script to run a few external commands using the os.subprocess module. But one of these steps takes a huge time and so I would like to run it separately. I need to launch them, check they are finished and then execute the next command which is not parallel. My code is something like this:

nproc = 24 
for i in xrange(nproc):
    #Run program in parallel

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    for i in xrange(nproc):
        for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
    handle.close()

#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)
like image 287
user1598231 Avatar asked Aug 14 '12 14:08

user1598231


People also ask

How do I run a parallel code in Python?

If we want to execute the function one at a time, we can use the submit() method. It schedules the target function for execution and returns a futures object. This futures object encapsulates the function's execution and allows us to check that it's running or if it's done and fetch the return value using result() .

How do you run multiple processes in Python in parallel?

One way to achieve parallelism in Python is by using the multiprocessing module. The multiprocessing module allows you to create multiple processes, each of them with its own Python interpreter. For this reason, Python multiprocessing accomplishes process-based parallelism.

Can Python be used for parallel processing?

Process-Based Parallelism With this approach, it is possible to start several processes at the same time (concurrently). This way, they can concurrently perform calculations. Starting from Python 3, the multiprocessing package is preinstalled and gives us a convenient syntax for launching concurrent processes.


1 Answers

For your example, you just want to shell out in parallel - you don't need threads for that.

Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm

Collect the Popen instances for each process you spawned and then wait() for them to finish:

procs = []
for i in xrange(nproc):
    procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
    p.wait()

You can get away with this (as opposed to using the multiprocessing or threading modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results...

like image 104
Daren Thomas Avatar answered Oct 12 '22 18:10

Daren Thomas