I'm using multiprocessing Pool to run a parallelized simulation in Python and it works well in a computer with multiple cores. Now I want to execute the program on a cluster using several nodes. I suppose multiprocessing cannot apply on distributed memory. But mpi4py seems a good option. So what is the simplest mpi4py equivalence to these codes:
from multiprocessing import Pool
pool = Pool(processes=16)
pool.map(functionName,parameters_list)
There is an MPIPool
class implemented here.
For an example of how I use this, check out this gist on GitHub.
There's an old package of mine that is built on mpi4py
which enables a functional parallel map for MPI
jobs. It's not built for speed -- it was built to enable aMPI
parallel map from the interpreter onto a compute cluster (i.e. without the need to run from the mpiexec
from the command line). Essentially:
>>> from pyina.launchers import MpiPool, MpiScatter
>>> pool = MpiPool()
>>> jobs = MpiScatter()
>>> def squared(x):
... return x**2
...
>>> pool.map(squared, range(4))
[0, 1, 4, 9]
>>> jobs.map(sqaured, range(4))
[0, 1, 4, 9]
Showing off the "worker pool" strategy and the "scatter-gather" strategy of distributing jobs to the workers. Of course, I wouldn't use it for such a small job like squared
because the overhead of spawning the MPI
world is really quite high (much higher than setting up a multiprocessing
Pool
). However, if you have a big job to run, like you would normally run on a cluster using MPI
, then pyina
can be a big benefit for you.
However, the real advantage of using pyina
is that it can not only spawn jobs with MPI
, but it can spawn jobs to a scheduler. pyina
understands and abstracts the launch syntax for several schedulers.
A typical call to a pyina
map using a scheduler goes like this:
>>> # instantiate and configure a scheduler
>>> from pyina.schedulers import Torque
>>> config = {'nodes'='32:ppn=4', 'queue':'dedicated', 'timelimit':'11:59'}
>>> torque = Torque(**config)
>>>
>>> # instantiate and configure a worker pool
>>> from pyina.launchers import Mpi
>>> pool = Mpi(scheduler=torque)
>>>
>>> # do a blocking map on the chosen function
>>> pool.map(pow, [1,2,3,4], [5,6,7,8])
[1, 64, 2187, 65536]
Several common configurations are available as pre-configured maps. The following is identical to the above example:
>>> # instantiate and configure a pre-configured worker pool
>>> from pyina.launchers import TorqueMpiPool
>>> config = {'nodes'='32:ppn=4', 'queue':'dedicated', 'timelimit':'11:59'}
>>> pool = TorqueMpiPool(**config)
>>>
>>> # do a blocking map on the chosen function
>>> pool.map(pow, [1,2,3,4], [5,6,7,8])
[1, 64, 2187, 65536]
pyina
needs some TLC, in that it's still python2.7
and that it hasn't had a release in several years… but it's been kept up to date (on github) otherwise and is able to "get the job done" for me running jobs on large-scale computing clusters over the past 10 years -- especially when coupled with pathos
(which provides ssh
tunneling and a unified interface for multiprocessing
and ParallelPython
maps). pyina
doesn't yet utilize shared memory, but does do embarrassingly functional parallel computing pretty well. The interactions with the scheduler are pretty good in general, but can be a bit rough around the edges for several failure cases -- and the non-blocking maps need a lot of work. That having been said, it provides a pretty useful interface to run embarrassingly parallel jobs on a cluster with MPI
.
Get pyina
(and pathos
) here: https://github.com/uqfoundation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With