Keras (TensorFlow, CPU): Training Sequential models in loop eats memory

Tags:

I am trying to train 1000x of Sequential models in a loop. In every loop my program leaks memory until I run out and get an OOM exception.

I already asked a similar question before (Training multiple Sequential models in a row slows down)

and have seen others in similar problems (Keras: Out of memory when doing hyper parameter grid search)

and the solution is always to add K.clear_session() to your code after you have finished using the model. So I did that in my previous question and I am still leaking memory

Here is code to reproduce the issue.

import random
import time
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
import tracemalloc


def run():
    tracemalloc.start()
    num_input_nodes = 12
    num_hidden_nodes = 8
    num_output_nodes = 1

    random_numbers = random.sample(range(1000), 50)
    train_x, train_y = create_training_dataset(random_numbers, num_input_nodes)

    for i in range(100):
        snapshot = tracemalloc.take_snapshot()
        for j in range(10):
            start_time = time.time()
            nn = Sequential()
            nn.add(Dense(num_hidden_nodes, input_dim=num_input_nodes, activation='relu'))
            nn.add(Dense(num_output_nodes))
            nn.compile(loss='mean_squared_error', optimizer='adam')
            nn.fit(train_x, train_y, nb_epoch=300, batch_size=2, verbose=0)
            K.clear_session()
            print("Iteration {iter}. Current time {t}. Took {elapsed} seconds".
                  format(iter=i*10 + j + 1, t=time.strftime('%H:%M:%S'), elapsed=int(time.time() - start_time)))

        top_stats = tracemalloc.take_snapshot().compare_to(snapshot, 'lineno')

        print("[ Top 5 differences ]")
        for stat in top_stats[:5]:
            print(stat)


def create_training_dataset(dataset, input_nodes):
    """
    Outputs a training dataset (train_x, train_y) as numpy arrays.
    Each item in train_x has 'input_nodes' number of items while train_y items are of size 1
    :param dataset: list of ints
    :param input_nodes:
    :return: (numpy array, numpy array), train_x, train_y
    """
    data_x, data_y = [], []
    for i in range(len(dataset) - input_nodes - 1):
        a = dataset[i:(i + input_nodes)]
        data_x.append(a)
        data_y.append(dataset[i + input_nodes])
    return numpy.array(data_x), numpy.array(data_y)

run()

Here is the output I get from the first memory debug print

/tensorflow/python/framework/ops.py:121: size=3485 KiB (+3485 KiB), count=42343 (+42343) /tensorflow/python/framework/ops.py:1400: size=998 KiB (+998 KiB), count=8413 (+8413) /tensorflow/python/framework/ops.py:116: size=888 KiB (+888 KiB), count=32468 (+32468) /tensorflow/python/framework/ops.py:1185: size=795 KiB (+795 KiB), count=3179 (+3179) /tensorflow/python/framework/ops.py:2354: size=599 KiB (+599 KiB), count=5886 (+5886)

System info:

python 3.5
keras (1.2.2)
tensorflow(1.0.0)

887

asked Mar 19 '17 11:03

G_E

1 Answers

The memory leak stems from Keras and TensorFlow using a single "default graph" to store the network structure, which increases in size with each iteration of the inner for loop.

Calling K.clear_session() frees some of the (backend) state associated with the default graph between iterations, but an additional call to tf.reset_default_graph() is needed to clear the Python state.

Note that there might be a more efficient solution: since nn does not depend on either of the loop variables, you can define it outside the loop, and reuse the same instance inside the loop. If you do that, there is no need to clear the session or reset the default graph, and performance should increase because you benefit from caching between iterations.

163

answered Sep 25 '22 14:09

mrry

Related questions
                            
                                ValueError: Unknown label type: while implementing MLPClassifier
                            
                                How to avoid inconsistent s[i:-j] slicing behaviour when j is sometimes 0?
                            
                                How to adjust table for a plot? More space for table and graph matplotlib python
                            
                                Is it possible to mock the builtin len() function in Python 3.6?
                            
                                How do I convert a string to a Python Decimal in German locale (with comma instead of a point)
                            
                                Seemingly infinite recursion with generator based coroutines
                            
                                TypeError: 'zip' object is not callable in Python 3.x
                            
                                Python 3 - importing .py file in same directory - ModuleNotFoundError: No module named '__main__.char'; '__main__' is not a package
                            
                                How to change the width of tabs in a QPlainTextEdit
                            
                                (gcloud.app.deploy) Error Response: [7] Access Not Configured. Cloud Build has not been used in project
                            
                                File I/O in the Python 3 C API
                            
                                Widget's "destroyed" signal is not fired (PyQT)
                            
                                in and index function of list [Python]
                            
                                Python 2's `exceptions` module is missing in Python3, where did its contents go?
                            
                                How to start two instances of Spyder with Python 2.7 & Python 3.4?
                            
                                Python regex search range of numbers
                            
                                Python ctypes import error in virtualenv
                            
                                Installing openCV in anaconda3 - Python.h: No such file or directory
                            
                                What is the proper level of indent for hanging indent with type hinting in python?
                            
                                Reading zipped JSON files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras (TensorFlow, CPU): Training Sequential models in loop eats memory

Tags:

python-3.x

tensorflow

keras

G_E

People also ask

1 Answers

mrry

Recent Activity

Donate For Us