Google released TensorFlow today.
I have been poking around in the code, and I don't see anything in the code or API about training across a cluster of GPU servers.
Does it have distributed training functionality yet?
Updated:
Distributed TensorFlow Documentation
Distributed TensorFlow Source
The release occurred on 2/26/2016 and was announced by coauthor Derek Murray in the original issue here and uses gRPC for inter-process communication.
Previous:
Before the update above, a distributed implementation of TensorFlow had not been released yet. Support for a distributed implementation was the topic of this issue where coauthor Vijay Vasudevan wrote:
we are working on making a distributed implementation available, it's currently not in the initial release
and Jeff Dean later provided an update:
Our current internal distributed extensions are somewhat entangled with Google internal infrastructure, which is why we released the single-machine version first. The code is not yet in GitHub, because it has dependencies on other parts of the Google code base at the moment, most of which have been trimmed, but there are some remaining ones.
We realize that distributed support is really important, and it's one of the top features we're prioritizing at the moment.
It took us a few months, but today marks the release of the initial distributed TensorFlow runtime. This includes support for multiple machines, each with multiple GPUs, with communication provided by gRPC.
The current version includes the necessary backend components so that you can assemble a cluster manually and connect to it from a client program. More details are available in the readme.
As you may have noticed. Tensorflow has already supported distributed DNN training for quite some time. Please refer to its offcial website for details.
=========================================================================
No, it doesn't support distribute training yet, which is a little disappointing. But I don't think it is difficult to extend from single machine to multi-machine. Compared to other open source libraries, like Caffe, TF's data graph structure is more suitable for cross-machine tasks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With