I want to parallelize a C serial code in a 100 node distributed memory cluster. The cluster consists of 25 blades with 4 cores each by infiniband. Before I just used PBS to spread several serial runs of the program between the different nodes. Now I wonder:
OpenMP is for shared memory computers, i believe you can't use it with distributed memory. So you will have to use MPI.
A good MPI tutorial is: https://computing.llnl.gov/tutorials/mpi/
Distributed memory kind of rules out OpenMP which is for shared-memory computing. MPI is a standard, and OpenMPI is an implementation of that standard (there are others such as MPICH or LAM-MPI). so
MPI, and OpenMPI is a perfectly respectable implementation thereof. However, I think it's relatively unusual to find such clusters as yours without an MPI installation, so a better choice might be the MPI installation you already have. You should certainly speak to the system's managers about this. And you should certainly not try to install OpenMPI on a cluster without knowing what you are doing.
All over the place. Here's one good place to start.
PBS is a job scheduling system. On a cluster such as yours you would typically have both an installation of MPI and an installation of a job scheduler, if not PBS then Grid Engine is the most likely.
As you've already discovered you can use PBS (or Grid Engine for that matter) to dispatch multiple serial jobs to a cluster. You can also use it to dispatch a single parallel job to a cluster for execution on however many processors you ask for. Your question raises the possibility, though, that your problem is embarassingly parallel and that MPI may be overkill for you. Google around for the term in italics before you commit yourself to parallelising your program -- unless you want to for the sheer enjoyment which will undoubtedly result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With