I'm working with MPI programs on an SMP supercomputer. I would like to identify which processes are sharing the same node, for example by setting an integer key that is equal in all processes on the same node, and different from a node to another. The goal would be then to use this key to split a communicator and have sub-communicators gathering only the processes in the same node.
So the function would look like
int identify_node(MPI_Comm* comm); // returns a key characterizing a node
Assuming a simple distribution of processes like 0,1,2,3 on node_1, 4,5,6,7 on node_2, etc. it is a matter of a simple formula, but I would like to achieve the same result with no assumption on the distribution.
I have an idea how to do that using MPI_Get_processor_name : by computing a hash of the name and assume no two names will get the same hash (I don't like this because if one day I have two names with the same hash, it will be difficult to track the problem), or use some kind of agreement algorithm across processes (which one? I don't know yet).
How would you do that (efficiently if possible)?
Matthieu
Figure 9: The predefined communicator MPI_COMM_WORLD for seven processes. The numbers indicate the ranks of each process. Every communicator contains a group which is a list of processes. Secondly, a group is in fact local to a particular process.
COMM_WORLD. This is a communicator (group) containing all processes available to the program. size = MPI::COMM_WORLD. Get_size(); The Get_size() function returns the size of the COMM_WORLD, or the total number of processes available to the program.
MPI_Comm_split is the simplest way to split a communicator into multiple, non-overlapping communicators. int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm) Argument list for MPI_Comm_split: comm - communicator to split. color - all processes with the same color go in the same communicator.
Edit: Groups are objects that represent groups of processes. Communicator is a set of processes that may communicate with each other and may consist of processes from a single group or from multiple groups. Thus they are completely different entities. They should not be confused with each other.
You're right that an assumption on the distribution would be unwise, since rank reordering is actually an up-and-coming technique for improving performance at the cost of that regularity.
A good hashing algorithm on the return value of MPI_Get_processor_name
should be pretty safe, but if you want to double-check, you could always gather up the actual names within each group using MPI_Gatherv
and compare them directly.
It seems this question addresses the same concerns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With