Requirements for use of CUDA-aware MPI

Question

I would like to exchange data between different CUDA-devices by means of CUDA-aware MPI as described in this article. As I understand it, the following code ought to do the job:

#include <mpi.h>

int main( int argc, char *argv[] )
{
  int rank;
  float *ptr = NULL;
  const size_t elements = 32;
  MPI_Status status;

  MPI_Init( NULL, NULL );
  MPI_Comm_rank( MPI_COMM_WORLD, &rank );
  cudaMalloc( (void**)&ptr, elements * sizeof(float) );

  if( rank == 0 )
    MPI_Send( ptr, elements, MPI_FLOAT, 1, 0, MPI_COMM_WORLD );
  if( rank == 1 )
    MPI_Recv( ptr, elements, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &status );

  cudaFree( ptr );
  MPI_Finalize();

  return 0;
}

Unfortunately, this program crashed with a segfault when executed on two processes, giving the following message:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x210000
[ 0] /lib64/libc.so.6[0x39d94326a0]
[ 1] /lib64/libc.so.6(memcpy+0xd2)[0x39d9489742]
[ 2] /usr/lib64/openmpi/lib/libopen-pal.so.6(opal_convertor_pack+0x18e)[0x2b750326cb1e]
[ 3] /usr/lib64/openmpi/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_sendi+0x3dc)[0x2b7507c2252c]
[ 4] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(+0x890f)[0x2b75086ec90f]
[ 5] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x499)[0x2b75086ed939]
[ 6] /usr/lib64/openmpi/lib/libmpi.so.1(PMPI_Send+0x1dd)[0x2b7502d3ef8d]
[ 7] prog(main+0x98)[0x400d51]
[ 8] /lib64/libc.so.6(__libc_start_main+0xfd)[0x39d941ed5d]
[ 9] prog[0x400be9]
*** End of error message ***

I use OpenMPI 1.8.2 and nvcc 6.5; as far as I know, these versions are supposed to support this feature.

So, my question is: What am I doing wrong? Am I missing some point? I would very much appreciate any hints at how to obtain a minimal working example!

Robert Crovella · Accepted Answer

The segfault is almost certainly due to passing a device pointer to MPI, when MPI is expecting a host pointer. Only properly built CUDA-aware MPI can accept the device pointer. It's not enough simply to have OpenMPI 1.8.2. You must have an OpenMPI version that is explicitly built with the CUDA-aware settings.

For OpenMPI,

Start here

Excerpting:

How do I build Open MPI with CUDA-aware support?

CUDA-aware support means that the MPI library can send and receive GPU buffers directly. This feature exists in the Open MPI 1.7 series and later. The support is being continuously updated so different levels of support exist in different versions.

Configuring Open MPI 1.7, MPI 1.7.1 and 1.7.2

--with-cuda(=DIR)       Build cuda support, optionally adding DIR/include,
                      DIR/lib, and DIR/lib64


--with-cuda-libdir=DIR  Search for cuda libraries in DIR

Here are some examples of configure commands that enable CUDA support.

Searches in default locations. Looks for cuda.h in /usr/local/cuda/include and libcuda.so in /usr/lib64.
```
./configure --with-cuda
```
Searches for cuda.h in /usr/local/cuda-v4.0/cuda/include and libcuda.so in default location of /usr/lib64.
```
./configure --with-cuda=/usr/local/cuda-v4.0/cuda
```
Searches for cuda.h in /usr/local/cuda-v4.0/cuda/include and libcuda.so in /usr/lib64. (same as previous one)
```
./configure --with-cuda=/usr/local/cuda-v4.0/cuda --with-cuda-libdir=/usr/lib64
```

If the cuda.h or libcuda.so files cannot be found, then the configure will abort.

Note: There is a bug in Open MPI 1.7.2 such that you will get an error if you configure the library with --enable-static. To get around this error, add the following to your configure line and reconfigure. This disables the build of the PML BFO which is largely unused anyways. This bug is fixed in Open MPI 1.7.3.

--enable-mca-no-build=pml-bfo

Configuring Open MPI 1.7.3 and later

With Open MPI 1.7.3 and later the libcuda.so library is loaded dynamically so there is no need to specify a path to it at configure time. Therefore, all you need is the path to the cuda.h header file.

Searches in default locations. Looks for cuda.h in /usr/local/cuda/include.
```
./configure --with-cuda
```
Searches for cuda.h in /usr/local/cuda-v5.0/cuda/include.
```
./configure --with-cuda=/usr/local/cuda-v5.0/cuda
```

Note that you cannot configure with --disable-dlopen as that will break the ability of the Open MPI library to dynamically load libcuda.so.

See

this FAQ entry for details on how to use the CUDA support.

Note that these instructions assume some familiarity with building OpenMPI. It's not enough simply to run ./configure ... There are make and make install steps after that. But the above configuration commands are what differentiate a CUDA-aware OpenMPI build from an ordinary one.

Requirements for use of CUDA-aware MPI

Tags:

c++

c

cuda

mpi

piripiri

1 Answers

Robert Crovella

Recent Activity

Donate For Us

Requirements for use of CUDA-aware MPI

Tags:

c++

c

cuda

mpi

piripiri

1 Answers

Robert Crovella

Related questions

Recent Activity

Donate For Us