I'm new to MPI. I have 4 processes: processes 1 through 3 populate a vector and send it to process 0, and process 0 collects the vectors into one very long vector. I have code that works (too long to post), but process 0's recv operation is clumsy and very slow.
In abstract, the code does the following:
MPI::Init();
int id = MPI::COMM_WORLD.Get_rank();
if(id>0) {
double* my_array = new double[n*m]; //n,m are int
Populate(my_array, id);
MPI::COMM_WORLD.Send(my_array,n*m,MPI::DOUBLE,0,50);
}
if(id==0) {
double* all_arrays = new double[3*n*m];
/* Slow Code Starts Here */
double startcomm = MPI::Wtime();
for (int i=1; i<=3; i++) {
MPI::COMM_WORLD.Recv(&all_arrays[(i-1)*m*n],n*m,MPI::DOUBLE,i,50);
}
double endcomm = MPI::Wtime();
//Process 0 has more operations...
}
MPI::Finalize();
It turns out that endcomm - startcomm
accounts for 50% of the total time (0.7 seconds compared to 1.5 seconds for the program to complete).
Is there a better way to receive the vectors from processes 1-3 and store them in process 0's all_arrays
?
I checked out MPI::Comm::Gather, but I'm not sure how to use it. In particular, will it allow me to specify that process 1's array is the first array in all_arrays, process 2's array the second, etc.? Thanks.
Edit: I removed the "slow" loop, and instead put the following between the "if" blocks:
MPI_Gather(my_array,n*m,MPI_DOUBLE,
&all_arrays[(id-1)*m*n],n*m,MPI_DOUBLE,0,MPI_COMM_WORLD);
The same slow performance resulted. Does this have something to do with the fact that the root process "waits" for each individual receive to complete before attempting the next one? Or is that not the right way to think about it?
Yes, MPI_Gather
will do exactly that. From the anl page for MPI_Gather:
int MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root, MPI_Comm comm)
Here, sendbuf
is your array on each process (my_array
). recvbuf
is the long array (all_arrays
) on the receiving process into which short arrays are being gathered to. The short array on the receiving process is being copied into its contiguous position in the long array, so you don't need to worry about doing it yourself. The arrays from each process will be arranged contiguously in the long array.
EDIT:
In the case where the receiving process does not contribute sendbuf in the gathering, you may want to use MPI_Gatherv instead (Thanks to @HristoIliev for pointing this out).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With