The MPI basic data types correspond to the data types of the host language, except MPI_BYTE and MPI_PACKED. My question is what's the benefit of using those MPI basic data type? Or equivalently, why it is bad to just use the host language data types?
I read a tutorial by William Gropp etc. In slide 31 "Why Datatypes", it says:
(http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiintro/ppframe.htm)
I don't grasp the explanation. First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes). Second, why this application-oriented layout of data in memory has the two benefits mentioned?
Any answers that address my original questions will be accepted. Any answer resolves my questions to William Gropp's explanation will also be accepted.
The short answer is that this system adds a level of strong-typing to MPI.
The long answer is that the purpose of the MPI datatypes is to tell the MPI functions what they're working with. So, for example, if you send an int from a little-endian machine to a big-endian one then MPI can do the byte order conversion for you. Another more common benefit is that MPI knows how big an MPI_DOUBLE is, so you don't have to have a bunch of sizeof
statements everywhere.
Note that the MPI datatypes are tags, not actual datatypes. In other words, you use
double d;
NOT
MPI_DOUBLE d;
First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes).
Because a given MPI datatype does not need to refer to the same elementary type on two diferent machines. MPI_INT
could be an int
on one machine and a long
on the other. This is especially useful in C++, since the C++ standard doesn't specify byte size for the various integral types, so an int
may in fact have more bits on one machine than the other.
Second, why this application-oriented layout of data in memory has the two benefits mentioned?
Look at the arguments of MPI_Send()
. It receives a void*
to the start of the data, and the number of elements to send. It assumes that the elements are lined up contiguously in memory, one after the other, and are all of the same type. In all but the luckiest of cases, this will not be true in your application. Even if you just have a simple array of structs (where the elements of the struct are not all the same type), the only way to send these structs without user-defined MPI datatypes would be to copy the first element from each struct to a separate array, send it, then copy the second element from each struct to a different array, send it, and so forth. Derived MPI datatypes allow you to pull data directly from where it is, without rearranging or copying it.
I'm not sure what the second point is supposed to refer to, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With