I have a loop in TensorFlow that looks like this:
with tf.device("/gpu:1"):
losses = []
for target, output in zip(targets, lstm_outputs):
logits = tf.matmul(W, output) + b
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, target)
losses.append(loss)
total_loss = tf.add_n(losses)
I am getting an OOM error when allocating the gradients for this layer, since each matrix multiplication is a different operation in the graph taking memory. Is there a way of preventing TensorFlow from allocating all these operations at the same time?
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.
This is a challenging graph for TensorFlow to optimize, since the activations from each layer must be kept to aggregate a single gradient for W
. One possibility is to pass the experimental aggregation_method
argument when calling optimizer.optimize()
.
For example, you could try the following:
optimizer = tf.train.AdagradOptimizer(...) # Or another optimization algorithm.
train_op = optimizer.minimize(
total_loss,
aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
This option eagerly aggregates the gradients for recurrently-used variables in place, rather than keeping them all in memory until all of the gradients have been computed. If this doesn't work, the tf.AggregationMethod.EXPERIMENTAL_TREE
may work better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With