I plan to run a very large recurrent network (e.g. 2048x5), is it possible to define one layer at one GPU in tensorflow? How should I implement the model to achieve the best efficiency. I understand there is overhead for inter-GPU or GPU-CPU-GPU communication.
Splitting a large model across multiple GPUs is certainly possible in TensorFlow, but doing it optimally is a hard research problem. In general, you will need to do the following:
Wrap large contiguous regions of your code in a with tf.device(...):
block, naming the different GPUs:
with tf.device("/gpu:0"):
# Define first layer.
with tf.device("/gpu:1"):
# Define second layer.
# Define other layers, etc.
When building your optimizer, pass the optional argument colocate_gradients_with_ops=True
to the optimizer.minimize()
method:
loss = ...
optimizer = tf.train.AdaGradOptimizer(0.01)
train_op = optimizer.minimize(loss, colocate_gradients_with_ops=True)
(Optionally.) You may need to enable "soft placement" in the tf.ConfigProto
when you create your tf.Session
, if any of the operations in your model cannot run on GPU:
config = tf.ConfigProto(allow_soft_placement=True)
sess = tf.Session(config=config)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With