I would like to use multiple GPUs to train my Tensorflow model taking advantage of data parallelism.
I am currently training a Tensorflow model using the following approach:
x_ = tf.placeholder(...)
y_ = tf.placeholder(...)
y = model(x_)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
optimizer = tf.train.AdamOptimizer()
train_op = tf.contrib.training.create_train_op(loss, optimizer)
for i in epochs:
for b in data:
_ = sess.run(train_op, feed_dict={x_: b.x, y_: b.y})
I would like to take advantage of multiple GPUs to train this model in a data parallelize manner. i.e. I would like to split my batches in half and run each half batch on one of my two GPUs.
cifar10_multi_gpu_train seems to provide a good example of creating a loss that draws from graphs running on multiple GPUs, but I haven't found a good examples of doing this style of training when using feed_dict
and placeholder
as opposed to a data loader queue.
UPDATE
Seems like: https://timsainb.github.io/multi-gpu-vae-gan-in-tensorflow.html might provide a good example. They seem to pull in average_gradients
from cifar10_multi_gpu_train.py
and create one placeholder which they then slice into for each of the GPUs.
I think you also need to split create_train_op
into three stages: compute_gradients
, average_gradients
and then apply_gradients
.
If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically. To override the device placement to use multiple GPUs, we manually specify the device that a computation node should run on.
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.
Advantages. It can train large models with millions and billions of parameters like: GPT-3, GPT-2, BERT, et cetera. Potentially low latency across the workers. Good TensorFlow community support.
I know three ways of feeding data on multi-gpu model.
x
on CPU, then use tf.split
to split x
into xs
. Then on each tower of GPU, get xs[i]
as your input.with tf.device("/cpu:0"):
encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
# make sure batch % num_gpu == 0
inputs = tf.split(encoder_inputs, axis=0) # axis=0, split on batch dimension
lens = tf.split(encoder_length, axis=0)
with tf.variable_scope(tf.get_variable_scope()):
for i in range(num_gpus):
with tf.device("/gpu:%d"%i):
with tf.name_scope("tower_%d"%i):
loss = compute_loss(inputs[i], lens[i])
x
on every GPU with a scope.
def init_placeholder(self):
with tf.variable_scope("inputs"): # use a scope
encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
return encoder_inputs, encoder_length
with tf.variable_scope(tf.get_variable_scope()):
for g, gpu in enumerate(GPUS):
with tf.device("/gpu:%d"%gpu):
with tf.name_scope("tower_%d"%g):
x, x_len = model.init_placeholder() # these placeholder Tensor are on GPU
loss = model.compute_loss(x, x_len)
tf.data.Dataset
to feed data. google official cifar10_multi_gpu_train.py
use Queue
, which is similar with this way.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With