Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I reduce memory consumption in a loop in TensorFlow?

I have a loop in TensorFlow that looks like this:

with tf.device("/gpu:1"):
    losses = []

    for target, output in zip(targets, lstm_outputs):
        logits = tf.matmul(W, output) + b
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, target)
        losses.append(loss)

    total_loss = tf.add_n(losses)

I am getting an OOM error when allocating the gradients for this layer, since each matrix multiplication is a different operation in the graph taking memory. Is there a way of preventing TensorFlow from allocating all these operations at the same time?

like image 673
Maarten Avatar asked Mar 24 '16 06:03

Maarten


People also ask

How do I limit TensorFlow memory usage?

Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.

Why Python memory consumption is high?

Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.


1 Answers

This is a challenging graph for TensorFlow to optimize, since the activations from each layer must be kept to aggregate a single gradient for W. One possibility is to pass the experimental aggregation_method argument when calling optimizer.optimize().

For example, you could try the following:

optimizer = tf.train.AdagradOptimizer(...)  # Or another optimization algorithm.
train_op = optimizer.minimize(
    total_loss,
    aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)

This option eagerly aggregates the gradients for recurrently-used variables in place, rather than keeping them all in memory until all of the gradients have been computed. If this doesn't work, the tf.AggregationMethod.EXPERIMENTAL_TREE may work better.

like image 166
mrry Avatar answered Sep 23 '22 13:09

mrry