I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?
I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator.
Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so?
I think this is all you need.
Link: https://www.youtube.com/watch?v=bRMGoPqsn20
More Details: https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy
Explained: https://medium.com/tensorflow/multi-gpu-training-with-estimators-tf-keras-and-tf-data-ba584c3134db
NUM_GPUS = 8
dist_strategy = tf.contrib.distribute.MirroredStrategy(num_gpus=NUM_GPUS)
config = tf.estimator.RunConfig(train_distribute=dist_strategy)
estimator = tf.estimator.Estimator(model_fn,model_dir,config=config)
UPDATED
With TF-2.0 and Keras you may use this (https://www.tensorflow.org/tutorials/distribute/keras)
I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator.replicate_model_fn documentation,
...
def model_fn(...): # See `model_fn` in `Estimator`.
loss = ...
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
if mode == tf.estimator.ModeKeys.TRAIN:
# See the section below on `EstimatorSpec.train_op`.
return EstimatorSpec(mode=mode, loss=loss,
train_op=optimizer.minimize(loss))
# No change for `ModeKeys.EVAL` or `ModeKeys.PREDICT`.
return EstimatorSpec(...)
...
classifier = tf.estimator.Estimator(
model_fn=tf.contrib.estimator.replicate_model_fn(model_fn))
What you need to do is to wrap optimizer with tf.contrib.estimator.TowerOptimize
and model_fn()
with tf.contrib.estimator.replicate_model_fn()
.
I followed the description and make an TPU squeezenet model work on a machine with 4 GPUs. My modifications here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With