Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the benefit of MPI Datatype?

Tags:

types

mpi

The MPI basic data types correspond to the data types of the host language, except MPI_BYTE and MPI_PACKED. My question is what's the benefit of using those MPI basic data type? Or equivalently, why it is bad to just use the host language data types?

I read a tutorial by William Gropp etc. In slide 31 "Why Datatypes", it says:

  • Since all data is labeled by type, an MPI implementation can support communication between processes on machines with very different memory representations and lengths of elementary datatypes (heterogeneous communication).
  • Specifying application-oriented layout of data in memory
    • reduces memory-to-memory copies in the implementation
    • allows the use of special hardware (scatter/gather) when available

(http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiintro/ppframe.htm)

I don't grasp the explanation. First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes). Second, why this application-oriented layout of data in memory has the two benefits mentioned?

Any answers that address my original questions will be accepted. Any answer resolves my questions to William Gropp's explanation will also be accepted.

like image 370
user2196452 Avatar asked Sep 27 '13 19:09

user2196452


2 Answers

The short answer is that this system adds a level of strong-typing to MPI.

The long answer is that the purpose of the MPI datatypes is to tell the MPI functions what they're working with. So, for example, if you send an int from a little-endian machine to a big-endian one then MPI can do the byte order conversion for you. Another more common benefit is that MPI knows how big an MPI_DOUBLE is, so you don't have to have a bunch of sizeof statements everywhere.

Note that the MPI datatypes are tags, not actual datatypes. In other words, you use

double d;

NOT

MPI_DOUBLE d;
like image 151
Adam Avatar answered Oct 11 '22 19:10

Adam


First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes).

Because a given MPI datatype does not need to refer to the same elementary type on two diferent machines. MPI_INT could be an int on one machine and a long on the other. This is especially useful in C++, since the C++ standard doesn't specify byte size for the various integral types, so an int may in fact have more bits on one machine than the other.

Second, why this application-oriented layout of data in memory has the two benefits mentioned?

Look at the arguments of MPI_Send(). It receives a void* to the start of the data, and the number of elements to send. It assumes that the elements are lined up contiguously in memory, one after the other, and are all of the same type. In all but the luckiest of cases, this will not be true in your application. Even if you just have a simple array of structs (where the elements of the struct are not all the same type), the only way to send these structs without user-defined MPI datatypes would be to copy the first element from each struct to a separate array, send it, then copy the second element from each struct to a different array, send it, and so forth. Derived MPI datatypes allow you to pull data directly from where it is, without rearranging or copying it.

I'm not sure what the second point is supposed to refer to, though.

like image 40
suszterpatt Avatar answered Oct 11 '22 19:10

suszterpatt