Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MPI-IO: write subarray

I am starting to use MPI-IO and tried to write a very simple example of the things I'd like to do with it; however, even though it is a simple code and I took some inspiration from examples I read here and there, I get a segmentation fault I do not understand.

The logic of the piece of code is very simple: each thread will handle a local array which is part of a globlal array I want to write. I create a subarray type using MPI_Type_Create_Subarray to do so. Then I just open the file, set a view and try to write the data. I get the segmentation fault during the MPI_File_Write_All.

Here is the code:

program test
  implicit none

  include "mpif.h"

  integer :: myrank, nproc, fhandle, ierr
  integer :: xpos, ypos
  integer, parameter :: loc_x=10, loc_y=10
  integer :: loc_dim
  integer :: nx=2, ny=2
  real(8), dimension(loc_x, loc_y) :: data
  integer :: written_arr
  integer, dimension(2) :: wa_size, wa_subsize, wa_start

  call MPI_Init(ierr)
  call MPI_Comm_Rank(MPI_COMM_WORLD, myrank, ierr)
  call MPI_Comm_Size(MPI_COMM_WORLD, nproc, ierr)

  xpos = mod(myrank, nx)
  ypos = mod(myrank/nx, ny)

  data = myrank

  loc_dim    = loc_x*loc_y
  wa_size    = (/ nx*loc_x, ny*loc_y /)
  wa_subsize = (/ loc_x, loc_y /)
  wa_start   = (/ xpos, ypos /)*wa_subsize
  call MPI_Type_Create_Subarray(2, wa_size, wa_subsize, wa_start &
       , MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, written_arr, ierr)
  call MPI_Type_Commit(written_arr, ierr)

  call MPI_File_Open(MPI_COMM_WORLD, "file.dat" &
       & , MPI_MODE_WRONLY + MPI_MODE_CREATE, MPI_INFO_NULL, fhandle, ierr)
  call MPI_File_Set_View(fhandle, 0, MPI_DOUBLE_PRECISION, written_arr &
       , "native", MPI_INFO_NULL, ierr)
  call MPI_File_Write_All(fhandle, data, loc_dim, MPI_DOUBLE_PRECISION &
       , MPI_INFO_NULL, ierr)
  call MPI_File_Close(fhandle, ierr)

  call MPI_Finalize(ierr)

end program test

Any help would be highly appreciated!

like image 847
MBR Avatar asked Mar 23 '23 00:03

MBR


1 Answers

The last argument to MPI_FILE_WRITE_ALL before the error output argument is an MPI status object and not an MPI info object. Making the call with MPI_INFO_NULL is therefore erroneous. If you are not interested in the status of the write operation then you should pass MPI_STATUS_IGNORE instead. Making the call with MPI_INFO_NULL might work in some MPI implementations because of the specifics of how both constants are defined, but then fail in others.

For example, in Open MPI MPI_INFO_NULL is declared as:

parameter (MPI_INFO_NULL=0)

When passed instead of MPI_STATUS_IGNORE it causes the C implementation of MPI_File_write_all to be called with the status argument pointing to a constant (read-only) memory location that holds the value of MPI_INFO_NULL (that how Fortran implements passing constants by address). When the C function is about to finish, it tries to fill the status object, which results in an attempt to write to the constant memory and ultimately leads to the segmentation fault.


When writing new Fortran programs it is advisable to not use the very old mpif.h interface as it does not provide any error checking. Rather one should use the mpi module or even mpi_f08 when more MPI implementations become MPI-3.0 compliant. The beginning of your program should therefore look like:

program test
   use mpi
   implicit none
   ...
end program test

Once you use the mpi module instead of mpif.h, the compiler is able to perform parameter type checking for some MPI calls, including MPI_FILE_SET_VIEW, and spot an error:

test.f90(34): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MPI_FILE_SET_VIEW]
  call MPI_File_Set_View(fhandle, 0, MPI_DOUBLE_PRECISION, written_arr &
-------^
compilation aborted for test.f90 (code 1)

The reason is that the second argument to MPI_FILE_SET_VIEW is of type INTEGER(KIND=MPI_OFFSET_KIND), which is 64-bit on most modern platforms. The constant 0 is simply of type INTEGER and is therefore 32-bit on most platforms. What happens is that with mpif.h the compiler passes a pointer to an INTEGER constant with value of 0, but the subroutine interprets this as a pointer to a larger integer and interprets the neighbouring values as part of the constant value. Thus the zero that you pass as an offset inside the file ends up being a non-zero value.

Replace the 0 in the MPI_FILE_SET_VIEW call with 0_MPI_OFFSET_KIND or declare a constant of type INTEGER(KIND=MPI_OFFSET_KIND) and a value of zero and then pass it.

call MPI_File_Set_View(fhandle, 0_MPI_OFFSET_KIND, MPI_DOUBLE_PRECISION, ...

or

integer(kind=MPI_OFFSET_KIND), parameter :: zero_off = 0
...
call MPI_File_Set_View(fhandle, zero_off, MPI_DOUBLE_PRECISION, ...

Both methods lead to an output file of size 3200 bytes (as expected).

like image 173
Hristo Iliev Avatar answered Apr 25 '23 19:04

Hristo Iliev