In MPI, is MPI_Bcast
purely a convenience function or is there an efficiency advantage to using it instead of just looping over all ranks and sending the same message to all of them?
Rationale: MPI_Bcast
's behavior of sending the message to everyone, including the root, is inconvenient for me, so I'd rather not use it unless there's a good reason, or it can be made to not send the message to root.
Using MPI_Bcast will definitely be more efficient than rolling your own. A lot of work has been done in all MPI implementations to optimise collective operations based on factors such as the message size and the communication architecture.
For example, MPI_Bcast in MPICH2 would use a different algorithm depending on the size of the message. For short messages, a binary tree is used to minimise processing load and latency. For long messages, it is implemented as a binary tree scatter followed by an allgather.
In addition, HPC vendors often provide MPI implementations that make efficient use of the underlying interconnects, especially for collective operations. For example, it is possible to use a hardware supported multicast scheme or to use bespoke algorithms that can take advantage of the existing interconnects.
The collective communications can be much faster than rolling your own. All of the MPI implmementations spend a lot of time working on those routines to be fast.
If you routinely want to do collective-type things but only on a subset of tasks, then you probably want to create your own sub-communicators and use BCAST, etc on those communicators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With