I’m having trouble calling an external program from my python script in which I want to use mpi4py to distribute the workload among different processors.
Basically, I want to use my script such that each core prepares some input files for calculations in separate folders, then starts an external program in this folder, waits for the output, and then, finally, reads the results and collects them.
However, I simply cannot get the external program call to work. On my search for a solution to this problem I've found that the problems I'm facing seem to be quite fundamental. The following simple example makes this clear:
#!/usr/bin/env python
import subprocess
subprocess.call(“EXTERNAL_PROGRAM”, shell=True)
subprocess.call(“echo test”, shell=True)
./script.py
works fine (both calls work), while mpirun -np 1 ./script.py
only outputs test
. Is there any workaround for this situation? The program is definitely in my PATH, but it also fails if I use the abolute path for the call.
This SO question seems to be related, sadly there are no answers...
EDIT:
In the original version of my question I’ve not included any code using mpi4py, even though I mention this module in the title. So here is a more elaborate example of the code:
#!/usr/bin/env python
import os
import subprocess
from mpi4py import MPI
def worker(parameter=None):
"""Make new folder, cd into it, prepare the config files and execute the
external program."""
cwd = os.getcwd()
dir = "_calculation_" + parameter
dir = os.path.join(cwd, dir)
os.makedirs(dir)
os.chdir(dir)
# Write input for simulation & execute
subprocess.call("echo {} > input.cfg".format(parameter), shell=True)
subprocess.call("EXTERNAL_PROGRAM", shell=True)
# After the program is finished, do something here with the output files
# and return the data. I'm using the input parameter as a dummy variable
# for the processed output.
data = parameter
os.chdir(cwd)
return data
def run_parallel():
"""Iterate over job_args in parallel."""
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
if rank == 0:
# Here should normally be a list with many more entries, subdivided
# among all the available cores. I'll keep it simple here, so one has
# to run this script with mpirun -np 2 ./script.py
job_args = ["a", "b"]
else:
job_args = None
job_arg = comm.scatter(job_args, root=0)
res = worker(parameter=job_arg)
results = comm.gather(res, root=0)
print res
print results
if __name__ == '__main__':
run_parallel()
Unfortunately I cannot provide more details of the external executable EXTERNAL_PROGRAM other than that it is a C++ application which is MPI enabled. As written in the comment section below, I suspect that this is the reason (or one of the resons) why my external program call is basically ignored.
Please note that I’m aware of the fact that in this situation, nobody can reproduce my exact situation. Still, however, I was hoping that someone here already ran into similar problems and might be able to help.
For completeness, the OS is Ubuntu 14.04 and I’m using OpenMPI 1.6.5.
In your first example you might be able to do this:
#!/usr/bin/env python
import subprocess
subprocess.call(“EXTERNAL_PROGRAM && echo test”, shell=True)
The python script is only facilitating the MPI call. You could just as well write a bash script with command “EXTERNAL_PROGRAM && echo test” and mpirun the bash script; it would be equivalent to mpirunning the python script.
The second example will not work if EXTERNAL_PROGRAM is MPI enabled. When using mpi4py it will initialize the MPI. You cannot spawn another MPI program once you initialized the MPI environment in such a manner. You could spawn using MPI_Comm_spawn or MPI_Comm_spawn_multiple and -up option to mpirun. For mpi4py refer to Compute PI example for spawning (use MPI.COMM_SELF.Spawn).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With