Implications of using MPI with TensorFlow

Question

I come from a sort of HPC background and I am just starting to learn about machine learning in general and TensorFlow in particular. I was initially surprised to find out that distributed TensorFlow is designed to communicate with TCP/IP by default though it makes sense in hindsight given what Google is and the kind of hardware it uses most commonly.

I am interested in experimenting with TensorFlow in a parallel way with MPI on a cluster. From my perspective, this should be advantageous because latency should be much lower due to MPI's use of Remote Direct Memory Access (RDMA) across machines without shared memory.

So my question is, why doesn't this approach seem to be more common given the increasing popularity of TensorFlow and machine learning ? Isn't latency a bottleneck ? Is there some typical problem that is solved, that makes this sort of solution impractical? Are there likely to be any meaningful differences between calling TensorFlow functions in a parallel way vs implementing MPI calls inside of the TensorFlow library ?

Thanks

Gilles Gouaillardet · Accepted Answer

It seems tensorflow already supports MPI, as stated at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/mpi MPI support for tensorflow was also discussed at https://arxiv.org/abs/1603.02339

Generally speaking, keep in mind MPI is best at sending/receiving messages, but not so great at sending notifications and acting upon events. Last but not least, MPI support of multi-threaded applications (e.g. MPI_THREAD_MULTIPLE) has not always been production-ready among MPI implementation s. These were two general statements and i honestly do not know if they are relevant for tensorflow.

Kehe CAI · Answer

According to the doc in Tensorflow git repo，actually tf utilizes gRPC library by detault, which is based on HTTP2 protocol, rather than TCP/IP protocol, and this paper should give you some insight, hope this information is useful.

Implications of using MPI with TensorFlow

Tags:

python

tensorflow

mpi

mpi4py

Cogitator

2 Answers

Gilles Gouaillardet

Kehe CAI

Recent Activity

Donate For Us

Implications of using MPI with TensorFlow

Tags:

python

tensorflow

mpi

mpi4py

Cogitator

2 Answers

Gilles Gouaillardet

Kehe CAI

Related questions

Recent Activity

Donate For Us