I'm trying to run distributed tensorflow on an EMR/EC2 cluster but I don't know how to specify different instances in the cluster to run parts of the code.
In the documentation, they've used tf.device("/gpu:0")
to specify a gpu. But what if I have a master CPU
and 5 different slaves GPU
instances running in an EMR cluster and I want to specify those GPUs to run some code? I can't input tf.device()
with the public DNS names of the instances because it throws an error saying the name cannot be resolved.
AWS provides broad support for TensorFlow, enabling customers to develop and serve their own models across computer vision, natural language processing, speech translation, and more.
Activating TensorFlow If you want to run the latest, untested nightly build, you can Install TensorFlow's Nightly Build (experimental) manually. To activate TensorFlow, open an Amazon Elastic Compute Cloud (Amazon EC2) instance of the DLAMI with Conda.
Starting today, you can easily train and deploy your PyTorch deep learning models in Amazon SageMaker. This is the fourth deep learning framework that Amazon SageMaker has added support for, in addition to TensorFlow, Apache MXNet, and Chainer.
We also found that 85 percent of Tensorflow projects running on the cloud are running on Amazon Web Services (AWS), because of AWS's breadth of capabilities, support, and ongoing investments in services such as Amazon SageMaker (a managed service for building, training, and deploying machine learning and deep learning ...
Since your question, AWS has released some code to ease the use of distributed TensorFlow on an EC2 cluster.
See this github repository. Everything is described in the README.md but the short story is that it will create an AWS stack with
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With