Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mpi collective operations from one communicator to another

I have an application that is parallelized with MPI and is split into a number of different tasks. Each processor is assigned only one task and the group of processors which is assigned the same task is assigned it's own communicator. Periodically, the tasks need to synchronize. Currently, the synchronization is done via MPI_COMM_WORLD, but that has the drawback that no collective operations can be used since it is not guaranteed that other tasks will ever reach that block of code.

As a more concrete example:

task1: equation1_solver, N nodes, communicator: mpi_comm_solver1
task2: equation2_solver, M nodes, communicator: mpi_comm_solver2
task3: file IO         , 1 node , communicator: mpi_comm_io

I would like to MPI_SUM an array on task1 and have the result appear at task3. Is there an efficient way to do this? (my apologies if this is a stupid question, I don't have much experience with creating and using custom MPI communicators)

like image 586
mgilson Avatar asked Jan 16 '23 22:01

mgilson


1 Answers

Charles is exactly right; the intercommunicators allow you to talk between communicators (or, to distinguish "normal" communicators in this context, "intra-communicators", which doesn't strike me as much of an improvement).

I've always found the use of these intercommunicators a little confusing for those new to it. Not the basic ideas, which make sense, but the mechanics using (say) MPI_Reduce with one of these. The group of tasks doing the reduction specify the root rank on the remote communicator, so far so good; but within the remote rank communicator, everyone not the root specifies MPI_PROC_NULL as root, whereas the actual root specifies MPI_ROOT. The things one does for backwards compatability, hey?

#include <mpi.h>
#include <stdio.h>


int main(int argc, char **argv)
{
    int commnum = 0;         /* which of the 3 comms I belong to */
    MPI_Comm   mycomm;       /* Communicator I belong to */
    MPI_Comm   intercomm;    /* inter-communicator */
    int cw_rank, cw_size;    /* size, rank in MPI_COMM_WORLD */
    int rank;                /* rank in local communicator */

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &cw_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &cw_size);

    if (cw_rank == cw_size-1)      /* last task is IO task */
        commnum = 2;
    else {
        if (cw_rank < (cw_size-1)/2)
            commnum = 0;
        else
            commnum = 1;
    }

    printf("Rank %d in comm %d\n", cw_rank, commnum);

    /* create the local communicator, mycomm */
    MPI_Comm_split(MPI_COMM_WORLD, commnum, cw_rank, &mycomm);

    const int lldr_tag = 1;
    const int intercomm_tag = 2;
    if (commnum == 0) {
        /* comm 0 needs to communicate with comm 2. */
        /* create an intercommunicator: */

        /* rank 0 in our new communicator will be the "local leader"
         *  of this commuicator for the purpose of the intercommuniator */
        int local_leader = 0;

        /* Now, since we're not part of the other communicator (and vice
         * versa) we have to refer to the "remote leader" in terms of its
         * rank in COMM_WORLD.   For us, that's easy; the remote leader
         * in the IO comm is defined to be cw_size-1, because that's the
         * only task in that comm.   But for them, it's harder.  So we'll
         * send that task the id of our local leader. */

        /* find out which rank in COMM_WORLD is the local leader */
        MPI_Comm_rank(mycomm, &rank);

        if (rank == 0)
            MPI_Send(&cw_rank, 1, MPI_INT, cw_size-1, 1, MPI_COMM_WORLD);
        /* now create the inter-communicator */
        MPI_Intercomm_create( mycomm, local_leader,
                              MPI_COMM_WORLD, cw_size-1,
                              intercomm_tag, &intercomm);
    }
    else if (commnum == 2)
    {
        /* there's only one task in this comm */
        int local_leader = 0;
        int rmt_ldr;
        MPI_Status s;
        MPI_Recv(&rmt_ldr, 1, MPI_INT, MPI_ANY_SOURCE, lldr_tag, MPI_COMM_WORLD, &s);
        MPI_Intercomm_create( mycomm, local_leader,
                              MPI_COMM_WORLD, rmt_ldr,
                              intercomm_tag, &intercomm);
    }


    /* now let's play with our communicators and make sure they work */

    if (commnum == 0) {
        int max_of_ranks = 0;
        /* try it internally; */
        MPI_Reduce(&rank, &max_of_ranks, 1, MPI_INT, MPI_MAX, 0, mycomm);
        if (rank == 0) {
            printf("Within comm 0: maximum of ranks is %d\n", max_of_ranks);
            printf("Within comm 0: sum of ranks should be %d\n", max_of_ranks*(max_of_ranks+1)/2);
        }

        /* now try summing it to the other comm */
        /* the "root" parameter here is the root in the remote group */
        MPI_Reduce(&rank, &max_of_ranks, 1, MPI_INT, MPI_SUM, 0, intercomm);
    }

    if (commnum == 2) {
        int sum_of_ranks = -999;
        int rootproc;

        /* get reduction data from other comm */

        if (rank == 0)   /* am I the root of this reduce? */
            rootproc = MPI_ROOT;
        else
            rootproc = MPI_PROC_NULL;

        MPI_Reduce(&rank, &sum_of_ranks, 1, MPI_INT, MPI_SUM, rootproc, intercomm);

        if (rank == 0) 
            printf("From comm 2: sum of ranks is %d\n", sum_of_ranks);
    }

    if (commnum == 0 || commnum == 2);
            MPI_Comm_free(&intercomm);

    MPI_Finalize();
}
like image 91
Jonathan Dursi Avatar answered Feb 16 '23 03:02

Jonathan Dursi