I am writing a machine learning toolkit to run algorithm with different settings in parallel (each process run the algorithm for one setting). I am thinking about either to use mpi4py or python's build-in multiprocessing ?
There are a few pros and cons I am considering about.
Easy-to-use:
Speed:
Clean and short code:
The working context is I am aiming at running the code basically in one computer or a GPU server. Not really targeting at running in different machines in the network (which only MPI can do it).
And since the main goal is doing machine learning, so the parallelization is not really required to be very optimal, the key goal I want to have is to balance easy, clean and quick to maintain code base but at the same time like to exploit the benefits of parallelization.
With the background described above, is it recommended that using multiprocessing should just be enough ? Or is there a very strong reason to use mpi4py ?
By using mpi4py you can divide the task into multiple threads, but with a single computer with limited performance or number of cores the usability will be limited. However you might find it handy during training.
mpi4py is constructed on top of the MPI-1/2 specifications and provides an object oriented interface which closely follows MPI-2 C++ bindings.
MPI for Python provides MPI bindings for the Python language, allowing programmers to exploit multiple processor computing systems. MPI for Python supports convenient, pickle-based communication of generic Python object as well as fast, near C-speed, direct array data communication of buffer-provider objects
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With