When starting an MPI job with mpirun
or mpiexec
, I can understand how one might go about starting each individual process. However, without any compiler magic, how do these wrapper executables communicate the arrangement (MPI communicator) to the MPI processes?
I am interested in the details, or a pointer on where to look.
MPI assigns an integer to each process beginning with 0 for the parent process and incrementing each time a new process is created. A process ID is also called its "rank". MPI also provides routines that let the process determine its process ID, as well as the number of processes that are have been created.
The processes communicate via calls to MPI communication primitives. Typically, each process executes in its own address space, although shared-memory implementations of MPI are possible. This document specifies the behavior of a parallel program assuming that only MPI calls are used for communication.
The functions MPI_INIT and MPI_FINALIZE are used to initiate and shut down an MPI computation, respectively. MPI_INIT must be called before any other MPI function and must be called exactly once per process. No further MPI functions can be called after MPI_FINALIZE.
Details on how individual processes establish the MPI universe are implementation specific. You should look into the source code of the specific library in order to understand how it works. There are two almost universal approaches though:
MPI_Init()
with argc
and argv
in C - thus the library can get access to the command line and extract all arguments that are meant for it;Open MPI for example sets environment variables and also writes some universe state in a disk location known to all processes that run on the same node. You can easily see the special variables that its run-time component ORTE (OpenMPI Run-Time Environment) uses by executing a command like mpirun -np 1 printenv
:
$ mpiexec -np 1 printenv | grep OMPI
... <many more> ...
OMPI_MCA_orte_hnp_uri=1660944384.0;tcp://x.y.z.t:43276;tcp://p.q.r.f:43276
OMPI_MCA_orte_local_daemon_uri=1660944384.1;tcp://x.y.z.t:36541
... <many more> ...
(IPs changed for security reasons)
Once a child process is launched remotely and MPI_Init()
or MPI_Init_thread()
is called, ORTE kicks in and reads those environment variables. Then it connects back to the specified network address with the "home" mpirun
/mpiexec
process which then coordinates all spawned processes into establishing the MPI universe.
Other MPI implementations work in a similar fashion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With