I've been using MPI_Scatter / MPI_Gather for a variety of parallel computations. One thing I've noticed is that MPI_Barrier() is often called to synchronize processors, akin to the OpenMP barrier directive. I was tweaking my code for a project and commented out my MPI_Barrier() lines below, and found that the computations were still correct. Why is this the case? I can understand why the first MPI_Barrier() is needed- the other processors don't need to wait; as soon as they get the data from processor MASTER they can begin computations. But is MPI_Barrier ever needed AFTER an MPI_Gather, or does MPI_Gather already have an implicit barrier within?
Edit: does the size of the data being processed matter in this case?
MPI_Scatter(&sendingbuffer,sendingcount,MPI_FLOAT,receivingbuffer,sendcount,
MPI_INT,MASTER_ID,MPI_COMM_WORLD);
// PERFORM SOME COMPUTATIONS
MPI_Barrier(); //<--- I understand why this is needed
MPI_Gather(localdata,sendcount, MPI_INT, global,sendcount, MPI_INT, MASTER_ID, MPI_COMM_WORLD);
//MPI_Barrier(); <------ is this ever needed?
None of the barriers are needed!
MPI_Gather
is a blocking operation, that is the outputs are available after the call completes. That does not imply a barrier, because non-root-ranks are allowed to, but not guaranteed to, complete before the root / other ranks starts it's operation. However, it is perfectly safe to access global
on the MASTER_ID
rank and reuse localdata
on any rank after the local call completes.
Synchronization with the message-based MPI is different from the shared-memory OpenMP. For blocking communication, usually no explicit synchronization ins necessary - the result is guaranteed to be available after the call completes.
Synchronization of sorts is necessary for non-blocking communication, but that is done via MPI_Test
/MPI_Wait
on specific messages - barriers might even provide a false sense of correctness if you tried to substitute a MPI_Wait
with MPI_Barrier
. With one-sided communication, it gets more complicated and barriers can play a role.
Actually, you only rarely need a barrier, instead avoid them to not introduce any unnecessary synchronization.
Edit: Given the contradicting other answers, here is the standard (MPI 3.1, Section 5.1) citation (emphasis mine).
Collective operations can (but are not required to) complete as soon as the caller’s participation in the collective communication is finished. A blocking operation is complete as soon as the call returns. A nonblocking (immediate) call requires a separate completion call (cf. Section 3.7). The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation.
To address the recent edit: No, data sizes have no impact of the correctness in this case. Data sizes in MPI sometimes have an impact on whether a incorrect MPI program will deadlock or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With