Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras (TensorFlow, CPU): Training Sequential models in loop eats memory

I am trying to train 1000x of Sequential models in a loop. In every loop my program leaks memory until I run out and get an OOM exception.

I already asked a similar question before (Training multiple Sequential models in a row slows down)

and have seen others in similar problems (Keras: Out of memory when doing hyper parameter grid search)

and the solution is always to add K.clear_session() to your code after you have finished using the model. So I did that in my previous question and I am still leaking memory

Here is code to reproduce the issue.

import random
import time
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
import tracemalloc


def run():
    tracemalloc.start()
    num_input_nodes = 12
    num_hidden_nodes = 8
    num_output_nodes = 1

    random_numbers = random.sample(range(1000), 50)
    train_x, train_y = create_training_dataset(random_numbers, num_input_nodes)

    for i in range(100):
        snapshot = tracemalloc.take_snapshot()
        for j in range(10):
            start_time = time.time()
            nn = Sequential()
            nn.add(Dense(num_hidden_nodes, input_dim=num_input_nodes, activation='relu'))
            nn.add(Dense(num_output_nodes))
            nn.compile(loss='mean_squared_error', optimizer='adam')
            nn.fit(train_x, train_y, nb_epoch=300, batch_size=2, verbose=0)
            K.clear_session()
            print("Iteration {iter}. Current time {t}. Took {elapsed} seconds".
                  format(iter=i*10 + j + 1, t=time.strftime('%H:%M:%S'), elapsed=int(time.time() - start_time)))

        top_stats = tracemalloc.take_snapshot().compare_to(snapshot, 'lineno')

        print("[ Top 5 differences ]")
        for stat in top_stats[:5]:
            print(stat)


def create_training_dataset(dataset, input_nodes):
    """
    Outputs a training dataset (train_x, train_y) as numpy arrays.
    Each item in train_x has 'input_nodes' number of items while train_y items are of size 1
    :param dataset: list of ints
    :param input_nodes:
    :return: (numpy array, numpy array), train_x, train_y
    """
    data_x, data_y = [], []
    for i in range(len(dataset) - input_nodes - 1):
        a = dataset[i:(i + input_nodes)]
        data_x.append(a)
        data_y.append(dataset[i + input_nodes])
    return numpy.array(data_x), numpy.array(data_y)

run()

Here is the output I get from the first memory debug print

/tensorflow/python/framework/ops.py:121: size=3485 KiB (+3485 KiB), count=42343 (+42343) /tensorflow/python/framework/ops.py:1400: size=998 KiB (+998 KiB), count=8413 (+8413) /tensorflow/python/framework/ops.py:116: size=888 KiB (+888 KiB), count=32468 (+32468) /tensorflow/python/framework/ops.py:1185: size=795 KiB (+795 KiB), count=3179 (+3179) /tensorflow/python/framework/ops.py:2354: size=599 KiB (+599 KiB), count=5886 (+5886)

System info:

  • python 3.5
  • keras (1.2.2)
  • tensorflow(1.0.0)
like image 887
G_E Avatar asked Mar 19 '17 11:03

G_E


People also ask

What is sequential in keras and TensorFlow?

Tensorflow sequential is the group containing the stack of linear format that consists of various layers of the library package tf. keras. Model. This Sequential class is inherited from the Module, Layer, and Model classes.

What does TensorFlow sequential do?

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. A Sequential model is not appropriate when: Your model has multiple inputs or multiple outputs. Any of your layers has multiple inputs or multiple outputs.

What is TensorFlow keras models in Python?

Keras is a neural network Application Programming Interface (API) for Python that is tightly integrated with TensorFlow, which is used to build machine learning models. Keras' models offer a simple, user-friendly way to define a neural network, which will then be built for you by TensorFlow.

How do I delete a keras session?

If you are creating many models in a loop, this global state will consume an increasing amount of memory over time, and you may want to clear it. Calling clear_session() releases the global state: this helps avoid clutter from old models and layers, especially when memory is limited.


1 Answers

The memory leak stems from Keras and TensorFlow using a single "default graph" to store the network structure, which increases in size with each iteration of the inner for loop.

Calling K.clear_session() frees some of the (backend) state associated with the default graph between iterations, but an additional call to tf.reset_default_graph() is needed to clear the Python state.

Note that there might be a more efficient solution: since nn does not depend on either of the loop variables, you can define it outside the loop, and reuse the same instance inside the loop. If you do that, there is no need to clear the session or reset the default graph, and performance should increase because you benefit from caching between iterations.

like image 163
mrry Avatar answered Sep 25 '22 14:09

mrry