Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MPI Send and Recv Hangs with Buffer Size Larger Than 64kb

Tags:

c

mpi

openmpi

I am trying to send data from process 0 to process 1. This program succeeds when the buffer size is less than 64kb, but hangs if the buffer gets much larger. The following code should reproduce this issue (should hang), but should succeed if n is modified to be less than 8000.

int main(int argc, char *argv[]){
  int world_size, world_rank,
      count;
  MPI_Status status;


  MPI_Init(NULL, NULL);

  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  if(world_size < 2){
    printf("Please add another process\n");
    exit(1);
  }

  int n = 8200;
  double *d = malloc(sizeof(double)*n);
  double *c = malloc(sizeof(double)*n);
  printf("malloc results %p %p\n", d, c);

  if(world_rank == 0){
    printf("sending\n");
    MPI_Send(c, n, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD);
    printf("sent\n");
  }
  if(world_rank == 1){
    printf("recv\n");
    MPI_Recv(d, n, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

    MPI_Get_count(&status, MPI_DOUBLE, &count);
    printf("recved, count:%d source:%d tag:%d error:%d\n", count, status.MPI_SOURCE, status.MPI_TAG, status.MPI_ERROR);
  }

  MPI_Finalize();

}

Output n = 8200;
malloc results 0x1cb05f0 0x1cc0640
recv
malloc results 0x117d5f0 0x118d640
sending

Output n = 8000;
malloc results 0x183c5f0 0x184c000
recv
malloc results 0x1ea75f0 0x1eb7000
sending
sent
recved, count:8000 source:0 tag:0 error:0

I found this question and this question which are similar, but I believe the issue there is with creating deadlocks. I would not expect a similar issue here because each process is performing only one send or receive.

EDIT: Added status checking.

EDIT2: It seems the issue was that I have OpenMPI installed but also installed an implementation of MPI from Intel when I installed MKL. My code was being compiled with the OpenMPI header and libraries, but run with Intel's mpirun. All works as expected when I ensure I run with the mpirun executable from OpenMPI.

like image 613
Ruvu Avatar asked Apr 03 '16 17:04

Ruvu


1 Answers

The issue was with having both Intel's MPI and OpenMPI installed. I saw that /usr/include/mpi.h was owned by OpenMPI, but mpicc and mpirun were from Intel's implementation:

$ which mpicc
/opt/intel/composerxe/linux/mpi/intel64/bin/mpicc
$ which mpirun
/opt/intel/composerxe/linux/mpi/intel64/bin/mpirun

I was able to solve the issue by running

/usr/bin/mpicc

and

/usr/bin/mpirun

to ensure I used OpenMPI.

Thanks to @Zulan and @gsamaras for the suggestion to check my installation.

like image 68
Ruvu Avatar answered Oct 02 '22 12:10

Ruvu