I have a school project to do matrix multiplication on a hpc distributed system.
I need to read in a matrix from a parallel IO system and use pblacs to perform the matrix multiplication in parallel on many compute nodes(processors). The data must be read in using MPI IO commands. I know PBlacs uses block cyclic distributions to perform the multiplication.
The professor has not given us much info on MPI IO, and I am having trouble finding much information/resources on it. Specifically, are there ways to read in a matrix from a parallel io system in a block cyclic manner and easily plug that into pblacs pdgemm?
Any pointers to useful resources would be much appreciated. I am a bit short on time, and getting frustrated with the lack of direction on this project.
This is actually relatively straightforward to do (if you already know something about blacs/scalapack and mpi-io!) but even then the documentation - even online - is as you've discovered, somewhat poor.
The first thing to know about MPI-IO is that it lets you use normal MPI data types to specify each process' "view" of the file, and then read only the data that falls into that view. At our centre we have slides and source code for a half-day course on parallel IO; the first third or so is about the basics of MPI-IO. There are slides here and sample source code here.
The second thing to know is that MPI has a built-in way to create "distributed array" data types, one combination of which lets you lay out a block-cyclic data distribution; that's discussed in general terms in my answer to this question: What is the difference between darray and subarray in mpi? .
So that means if you have a binary file containing a big matrix, you can read it in with mpi-io using MPI_Type_create_darray
and it'll be distributed by tasks in a block-cyclic way. Then it's just a matter of doing the blacs or scalapack call. An example program of using the scalapack psgemv for matrix-vector multiplication rather than psgemm is listed
in my answer to a question on the Computational Science stack exchange.
To give you an idea of how the pieces fit together, the following is a simple program which reads in a binary file containing a matrix (first the size of the square matrix N and then the N^2 elements) and then calculates the eigenvalues and vectors using scalapack's (new) pssyevr
routine. It combines the MPI-IO, darray, and scalapack stuff. It's in Fortran, but the function calls are the same in C-based languages.
!
! Use MPI-IO to read a diagonal matrix distributed block-cyclically,
! use Scalapack to calculate its eigenvalues, and compare
! to expected results.
!
program darray
use mpi
implicit none
integer :: n, nb ! problem size and block size
integer :: myArows, myAcols ! size of local subset of global array
real :: p
real, dimension(:), allocatable :: myA, myZ
real, dimension(:), allocatable :: work
integer, dimension(:), allocatable :: iwork
real, dimension(:), allocatable :: eigenvals
real, dimension(:), allocatable :: expected
integer :: worksize, totwork, iworksize
integer, external :: numroc ! blacs routine
integer :: me, procs, icontxt, prow, pcol, myrow, mycol ! blacs data
integer :: info ! scalapack return value
integer, dimension(9) :: ides_a, ides_z ! scalapack array desc
integer :: clock
real :: calctime, iotime
character(len=128) :: filename
integer :: mpirank
integer :: ierr
integer, dimension(2) :: pdims, dims, distribs, dargs
integer :: infile
integer, dimension(MPI_STATUS_SIZE) :: mpistatus
integer :: darray
integer :: locsize, nelements
integer(kind=MPI_ADDRESS_KIND) :: lb, locextent
integer(kind=MPI_OFFSET_KIND) :: disp
integer :: nargs
integer :: m, nz
! Initialize MPI (for MPI-IO)
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD,procs,ierr)
call MPI_Comm_rank(MPI_COMM_WORLD,mpirank,ierr)
! May as well get the process grid from MPI_Dims_create
pdims = 0
call MPI_Dims_create(procs, 2, pdims, ierr)
prow = pdims(1)
pcol = pdims(2)
! get filename
nargs = command_argument_count()
if (nargs /= 1) then
print *,'Usage: darray filename'
print *,' Where filename = name of binary matrix file.'
call MPI_Abort(MPI_COMM_WORLD,1,ierr)
endif
call get_command_argument(1, filename)
! find the size of the array we'll be using
call tick(clock)
call MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
call MPI_File_read_all(infile,n,1,MPI_INTEGER,mpistatus,ierr)
call MPI_File_close(infile,ierr)
! create the darray that will read in the data.
nb = 64
if (nb > (N/prow)) nb = N/prow
if (nb > (N/pcol)) nb = N/pcol
dims = [n,n]
distribs = [MPI_DISTRIBUTE_CYCLIC, MPI_DISTRIBUTE_CYCLIC]
dargs = [nb, nb]
call MPI_Type_create_darray(procs, mpirank, 2, dims, distribs, dargs, &
pdims, MPI_ORDER_FORTRAN, MPI_REAL, darray, ierr)
call MPI_Type_commit(darray,ierr)
call MPI_Type_size(darray, locsize, ierr)
nelements = locsize/4
call MPI_Type_get_extent(darray, lb, locextent, ierr)
! Initialize local arrays
allocate(myA(nelements))
allocate(myZ(nelements))
allocate(eigenvals(n), expected(n))
! read in the data
call MPI_File_open(MPI_COMM_WORLD, trim(filename), MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
disp = 4 ! skip N = 4 bytes
call MPI_File_set_view(infile, disp, MPI_REAL, darray, "native", MPI_INFO_NULL, ierr)
call MPI_File_read_all(infile, myA, nelements, MPI_REAL, mpistatus, ierr)
call MPI_File_close(infile,ierr)
iotime = tock(clock)
if (mpirank == 0) then
print *,'I/O time = ', iotime
endif
! Initialize blacs processor grid
call tick(clock)
call blacs_pinfo (me,procs)
call blacs_get (-1, 0, icontxt)
call blacs_gridinit(icontxt, 'R', prow, pcol)
call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol)
myArows = numroc(n, nb, myrow, 0, prow)
myAcols = numroc(n, nb, mycol, 0, pcol)
! Construct local arrays
! Global structure: matrix A of n rows and n columns
! Prepare array descriptors for ScaLAPACK
call descinit( ides_a, n, n, nb, nb, 0, 0, icontxt, myArows, info )
call descinit( ides_z, n, n, nb, nb, 0, 0, icontxt, myArows, info )
! Call ScaLAPACK library routine
allocate(work(1), iwork(1))
iwork(1) = -1
work(1) = -1.
call pssyevr( 'V', 'A', 'U', n, myA, 1, 1, ides_a, -1.e20, +1.e20, 1, n, &
m, nz, eigenvals, myZ, 1, 1, ides_z, work, -1, &
iwork, -1, info )
worksize = int(work(1))/2*3
iworksize = iwork(1)/2*3
print *, 'Local workspace ', worksize
call MPI_Reduce(worksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
if (mpirank == 0) print *, ' total work space ', totwork
call MPI_Reduce(iworksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
if (mpirank == 0) print *, ' total iwork space ', totwork
deallocate(work,iwork)
allocate(work(worksize),iwork(iworksize))
call pssyev('N','U',n,myA,1,1,ides_a,eigenvals,myZ,1,1,ides_z,work,worksize,info)
if (info /= 0) then
print *, 'Error: info = ', info
else if (mpirank == 0) then
print *, 'Calculated ', m, ' eigenvalues and ', nz, ' eigenvectors.'
endif
! Deallocate the local arrays
deallocate(myA, myZ)
deallocate(work, iwork)
! End blacs for processors that are used
call blacs_gridexit(icontxt)
calctime = tock(clock)
! calculated the expected eigenvalues for a particular test matrix
p = 3. + sqrt((4. * n - 3.) * (n - 1.)*3./(n+1.))
expected(1) = p/(n*(5.-2.*n))
expected(2) = 6./(p*(n + 1.))
expected(3:n) = 1.
! Print results
if (me == 0) then
if (info /= 0) then
print *, 'Error -- info = ', info
endif
print *,'Eigenvalues L_infty err = ', &
maxval(abs(eigenvals-expected))
print *,'Compute time = ', calctime
endif
deallocate(eigenvals, expected)
call MPI_Finalize(ierr)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
end program darray
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With