I have a large parallel (using MPI) simulation application which produces large amounts of data. In order to evaluate this data I use a python script.
What I now need to do is to run this application a large number of times (>1000) and calculate statistical properties from the resulting data.
My approach up until now is, to have a python script running in parallel (using mpi4py, using i.e. 48 nodes) calling the simulation code using subprocess.check_call
.
I need this call to run my mpi simulation application in serial.
I do not need the simulation to also run in parallel in this case.
The python script can then analyze the data in parallel and after finishing it will startup a new simulation run till a large number of runs is accumulated.
Goals are
Stub MWE:
multi_call_master.py
:from mpi4py import MPI
import subprocess
print "Master hello"
call_string = 'python multi_call_slave.py'
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print "rank %d of size %d in master calling: %s" % (rank, size, call_string)
std_outfile = "./sm_test.out"
nr_samples = 1
for samples in range(0, nr_samples):
with open(std_outfile, 'w') as out:
subprocess.check_call(call_string, shell=True, stdout=out)
# analyze_data()
# communicate_results()
multi_call_slave.py
(this would be the C simulation code):from mpi4py import MPI
print "Slave hello"
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print "rank %d of size %d in slave" % (rank, size)
This will not work. Resulting output in stdout
:
Master hello
rank 1 of size 2 in master calling: python multi_call_slave_so.py
Master hello
rank 0 of size 2 in master calling: python multi_call_slave_so.py
[cli_0]: write_line error; fd=7 buf=:cmd=finalize
:
system msg for write_line failure : Broken pipe
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(311).....: MPI_Finalize failed
MPI_Finalize(229).....:
MPID_Finalize(150)....:
MPIDI_PG_Finalize(126): PMI_Finalize failed, error -1
[cli_1]: write_line error; fd=8 buf=:cmd=finalize
:
system msg for write_line failure : Broken pipe
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(311).....: MPI_Finalize failed
MPI_Finalize(229).....:
MPID_Finalize(150)....:
MPIDI_PG_Finalize(126): PMI_Finalize failed, error -1
Resulting output in sm_test.out
:
Slave hello
rank 0 of size 2 in slave
The reason is, that the subprocess assumes to be run as a parallel application, whereas I intend to run it as a serial application. As a very "hacky" workaround I did the following:
If I would now start my parallel python script using intel mpi, the underlying simulation would not be aware of the surrounding parallel environment as it was using a different library.
This worked fine for a while, but unfortunately is not very portable and difficult to maintain on different clusters for various reasons.
I could
srun
MPI_Comm_spawn
technique in python
mpirun -n 1
or srun
for the subprocess call does not helpIs there any elegant, official way of doing this? I am really out of ideas and appreciate any input!
No, there is neither an elegant nor an official way to do this. The only officially supported way to execute other programs from within an MPI application is the use of MPI_Comm_spawn
. Spawning child MPI processes via simple OS mechanisms like the one provided by subprocess
is dangerous and could even have catastrophic consequences in certain cases.
While MPI_Comm_spawn
does not provide a mechanism to find out when the child process has exited, you could kind of simulate it with an intercomm barrier. You will still face problems since the MPI_Comm_spawn
call does not allow for the standard I/O to be redirected arbitrarily and instead it gets redirected to mpiexec
/mpirun
.
What you could do is to write a wrapper script that deletes all possible pathways that the MPI library might use in order to pass session information around. For Open MPI that would be any environment variable that starts with OMPI_
. For Intel MPI that would be variables that start with I_
. And so on. Some libraries might use files or shared memory blocks or some other OS mechanisms and you'll have to take care of those too. Once any possible mechanism to communicate MPI session information has been eradicated, you could simply start the executable and it should form a singleton MPI job (that is, behave as if run with mpiexec -n 1
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With