When using MPI 3 shared memory, it occurred to me that writing to adjacent memory positions of a shared memory window simultaneously on different tasks seemingly does not work.
I guessed that MPI ignores possible cache conflicts and now my question is if that is correct and MPI indeed does not care about cache coherency, or if this is a quirk of the implementation, or if there is a completely different explanation to that behaviour?
This is a minimal example where, in Fortran, simultaneously writing to distinct addresses in a shared memory window causes a conflict (tested with intel MPI 2017, 2018, 2019 and GNU OpenMPI 3).
program testAlloc
use mpi
use, intrinsic :: ISO_C_BINDING, only: c_ptr, c_f_pointer
implicit none
integer :: ierr
integer :: window
integer(kind=MPI_Address_kind) :: wsize
type(c_ptr) :: baseptr
integer, pointer :: f_ptr
integer :: comm_rank
call MPI_Init(ierr)
! Each processor allocates one entry
wsize = 1
call MPI_WIN_ALLOCATE_SHARED(wsize,4,MPI_INFO_NULL,MPI_COMM_WORLD,baseptr,window,ierr)
! Convert to a fortran pointer
call c_f_pointer(baseptr, f_ptr)
! Now, assign some value simultaneously
f_ptr = 4
! For output, get the mpi rank
call mpi_comm_rank(MPI_COMM_WORLD, comm_rank, ierr)
! Output the assigned value - only one task reports 4, the others report junk
print *, "On task", comm_rank, "value is", f_ptr
call MPI_Win_free(window, ierr)
call MPI_Finalize(ierr)
end program
Curiously, the same program in C does seem to work as intended, which leads to the question if there is something wrong with the Fortran implementation, or the C program is just lucky (tested with the same MPI libraries)
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
// Allocate a single resource per task
MPI_Aint wsize = 1;
// Do a shared allocation
int *resource;
MPI_Win window;
MPI_Win_allocate_shared(wsize, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &resource, &window);
// For output clarification, get the mpi rank
int comm_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
// Assign some value
*resource = 4;
// Tell us the value - this seems to work
printf("On task %d the value is %d\n",comm_rank,*resource);
MPI_Win_free(&window);
MPI_Finalize();
}
From the MPI 3.1 standard (chapter 11.2.3 page 407)
MPI_WIN_ALLOCATE_SHARED(size, disp_unit, info, comm, baseptr, win)
IN size size of local window in bytes (non-negative integer)
Note the window size is in bytes and not in number of units.
So all you need is to use
wsize = 4
in Fortran (assuming your INTEGER size is indeed 4) and
wsize = sizeof(int);
in C
FWIW
C version seems to give the expected result most of the time, it is also incorrect and I am able to evidence this by running under the program under a debugger.volatile int * resource; in C to prevent the compiler from performing some optimizations that might impact the behavior of your app (and this is not needed here).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With