I would like to exchange data between different CUDA-devices by means of CUDA-aware MPI as described in this article. As I understand it, the following code ought to do the job:
#include <mpi.h>
int main( int argc, char *argv[] )
{
int rank;
float *ptr = NULL;
const size_t elements = 32;
MPI_Status status;
MPI_Init( NULL, NULL );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
cudaMalloc( (void**)&ptr, elements * sizeof(float) );
if( rank == 0 )
MPI_Send( ptr, elements, MPI_FLOAT, 1, 0, MPI_COMM_WORLD );
if( rank == 1 )
MPI_Recv( ptr, elements, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &status );
cudaFree( ptr );
MPI_Finalize();
return 0;
}
Unfortunately, this program crashed with a segfault when executed on two processes, giving the following message:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x210000
[ 0] /lib64/libc.so.6[0x39d94326a0]
[ 1] /lib64/libc.so.6(memcpy+0xd2)[0x39d9489742]
[ 2] /usr/lib64/openmpi/lib/libopen-pal.so.6(opal_convertor_pack+0x18e)[0x2b750326cb1e]
[ 3] /usr/lib64/openmpi/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_sendi+0x3dc)[0x2b7507c2252c]
[ 4] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(+0x890f)[0x2b75086ec90f]
[ 5] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x499)[0x2b75086ed939]
[ 6] /usr/lib64/openmpi/lib/libmpi.so.1(PMPI_Send+0x1dd)[0x2b7502d3ef8d]
[ 7] prog(main+0x98)[0x400d51]
[ 8] /lib64/libc.so.6(__libc_start_main+0xfd)[0x39d941ed5d]
[ 9] prog[0x400be9]
*** End of error message ***
I use OpenMPI 1.8.2 and nvcc 6.5; as far as I know, these versions are supposed to support this feature.
So, my question is: What am I doing wrong? Am I missing some point? I would very much appreciate any hints at how to obtain a minimal working example!
The segfault is almost certainly due to passing a device pointer to MPI, when MPI is expecting a host pointer. Only properly built CUDA-aware MPI can accept the device pointer. It's not enough simply to have OpenMPI 1.8.2. You must have an OpenMPI version that is explicitly built with the CUDA-aware settings.
For OpenMPI,
Start here
Excerpting:
CUDA-aware support means that the MPI library can send and receive GPU buffers directly. This feature exists in the Open MPI 1.7 series and later. The support is being continuously updated so different levels of support exist in different versions.
Configuring Open MPI 1.7, MPI 1.7.1 and 1.7.2
--with-cuda(=DIR) Build cuda support, optionally adding DIR/include,
DIR/lib, and DIR/lib64
--with-cuda-libdir=DIR Search for cuda libraries in DIR
Here are some examples of configure commands that enable CUDA support.
Searches in default locations. Looks for cuda.h in /usr/local/cuda/include and libcuda.so in /usr/lib64.
./configure --with-cuda
Searches for cuda.h in /usr/local/cuda-v4.0/cuda/include and libcuda.so in default location of /usr/lib64.
./configure --with-cuda=/usr/local/cuda-v4.0/cuda
Searches for cuda.h in /usr/local/cuda-v4.0/cuda/include and libcuda.so in /usr/lib64. (same as previous one)
./configure --with-cuda=/usr/local/cuda-v4.0/cuda --with-cuda-libdir=/usr/lib64
If the cuda.h or libcuda.so files cannot be found, then the configure will abort.
Note: There is a bug in Open MPI 1.7.2 such that you will get an error if you configure the library with --enable-static. To get around this error, add the following to your configure line and reconfigure. This disables the build of the PML BFO which is largely unused anyways. This bug is fixed in Open MPI 1.7.3.
--enable-mca-no-build=pml-bfo
Configuring Open MPI 1.7.3 and later
With Open MPI 1.7.3 and later the libcuda.so library is loaded dynamically so there is no need to specify a path to it at configure time. Therefore, all you need is the path to the cuda.h header file.
Searches in default locations. Looks for cuda.h in /usr/local/cuda/include.
./configure --with-cuda
Searches for cuda.h in /usr/local/cuda-v5.0/cuda/include.
./configure --with-cuda=/usr/local/cuda-v5.0/cuda
Note that you cannot configure with --disable-dlopen
as that will break the ability of the Open MPI library to dynamically load libcuda.so.
See
this FAQ entry for details on how to use the CUDA support.
Note that these instructions assume some familiarity with building OpenMPI. It's not enough simply to run ./configure ...
There are make and make install steps after that. But the above configuration commands are what differentiate a CUDA-aware OpenMPI build from an ordinary one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With