Keras model training memory leak

Tags:

I'm new with Keras, Tensorflow, Python and I'm trying to build a model for personal use/future learning. I've just started with python and I came up with this code (with help of videos and tutorials). My problem is that my memory usage of Python is slowly creeping up with each epoch and even after constructing new model. Once the memory is at 100% the training just stop with no error/warning. I don´t know too much but the issue should be somewhere within the loop (If I´m not mistaken). I know about

k.clear.session()

but either the issue was not removed or I don´t know how to integrate it in my code. I have: Python v 3.6.4, Tensorflow 2.0.0rc1 (cpu version), Keras 2.3.0

This is my code:

import pandas as pd
import os
import time
import tensorflow as tf
import numpy as np
import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, BatchNormalization
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint

EPOCHS = 25
BATCH_SIZE = 32           

df = pd.read_csv("EntryData.csv", names=['1SH5', '1SHA', '1SA5', '1SAA', '1WH5', '1WHA',
                                         '2SA5', '2SAA', '2SH5', '2SHA', '2WA5', '2WAA',
                                         '3R1', '3R2', '3R3', '3R4', '3R5', '3R6',
                                         'Target'])

df_val = 14554 

validation_df = df[df.index > df_val]
df = df[df.index <= df_val]

train_x = df.drop(columns=['Target'])
train_y = df[['Target']]
validation_x = validation_df.drop(columns=['Target'])
validation_y = validation_df[['Target']]

train_x = np.asarray(train_x)
train_y = np.asarray(train_y)
validation_x = np.asarray(validation_x)
validation_y = np.asarray(validation_y)

train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
validation_x = validation_x.reshape(validation_x.shape[0], 1, validation_x.shape[1])

dense_layers = [0, 1, 2]
layer_sizes = [32, 64, 128]
conv_layers = [1, 2, 3]

for dense_layer in dense_layers:
    for layer_size in layer_sizes:
        for conv_layer in conv_layers:
            NAME = "{}-conv-{}-nodes-{}-dense-{}".format(conv_layer, layer_size, 
                    dense_layer, int(time.time()))
            tensorboard = TensorBoard(log_dir="logs\{}".format(NAME))
            print(NAME)

            model = Sequential()
            model.add(LSTM(layer_size, input_shape=(train_x.shape[1:]), 
                                       return_sequences=True))
            model.add(Dropout(0.2))
            model.add(BatchNormalization())

            for l in range(conv_layer-1):
                model.add(LSTM(layer_size, return_sequences=True))
                model.add(Dropout(0.1))
                model.add(BatchNormalization())

            for l in range(dense_layer):
                model.add(Dense(layer_size, activation='relu'))
                model.add(Dropout(0.2))

            model.add(Dense(2, activation='softmax'))

            opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)

            # Compile model
            model.compile(loss='sparse_categorical_crossentropy',
                          optimizer=opt,
                          metrics=['accuracy'])

            # unique file name that will include the epoch 
            # and the validation acc for that epoch
            filepath = "RNN_Final.{epoch:02d}-{val_accuracy:.3f}"  
            checkpoint = ModelCheckpoint("models\{}.model".format(filepath, 
                         monitor='val_acc', verbose=0, save_best_only=True, 
                         mode='max')) # saves only the best ones

            # Train model
            history = model.fit(
                train_x, train_y,
                batch_size=BATCH_SIZE,
                epochs=EPOCHS,
                validation_data=(validation_x, validation_y),
                callbacks=[tensorboard, checkpoint])

# Score model
score = model.evaluate(validation_x, validation_y, verbose=2)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# Save model
model.save("models\{}".format(NAME))

Also I don´t know If it´s possible to ask 2 problems within 1 question (I don´t want to spam it here with my problems which anyone with any python experience can resolve within a minute), but I also have problem with checkpoint saving. I want to save only the best performing model (1 model per 1 NN specification - number of nodes/layers) but currently it is saved after every epoch. If this is inappropriate to ask I can create another question for this.

Thank you very much for any help.

675

asked Sep 27 '19 15:09

Sly Shark

1 Answers

One source of the problem is, a new loop of model = Sequential() does not remove the previous model; it remains built within its TensorFlow graph scope, and every new model = Sequential() adds another lingering construction which eventually overflows memory. To ensure a model is properly destroyed in full, run below once you're done with a model:

import gc
del model
gc.collect()
K.clear_session()
tf.compat.v1.reset_default_graph() # TF graph isn't same as Keras graph

gc is Python's garbage collection module, which clears remnant traces of model after del. K.clear_session() is the main call, and clears the TensorFlow graph.

Also, while your idea for model checkpointing, logging, and hyperparameter search is quite sound, it's quite faultily executed; you will actually be testing only one hyperparameter combination for the entire nested loop you've set up there. But this should be asked in a separate question.

UPDATE: just encountered the same problem, on a fully properly setup environment; the likeliest conclusion is, it's a bug - and a definite culprit is Eager execution. To work around, use

tf.compat.v1.disable_eager_execution() # right after `import tensorflow as tf`

to switch to Graph mode, which can also run significantly faster. Also see updated clear code above.

118

answered Oct 17 '22 12:10

OverLordGoldDragon

Related questions
                            
                                B-Spline derivative using de Boor's algorithm
                            
                                Pandas groupby resample poor performance
                            
                                Using OpenCV Hough Tranform for line detection in 2D point cloud
                            
                                get coordinates of 4 corners of display screen on image
                            
                                Pandas read csv not reading a file properly. Not splitting into proper columns
                            
                                fastai error predicting with exported/reloaded model: "Input type and weight type should be the same"
                            
                                How to detect Sudoku grid board in OpenCV
                            
                                How to set Try/Except timeout to find element in Python Selenium?
                            
                                How to make unit test with `asyncio.sleep()` contained code?
                            
                                Automatically generate all help documentation for Click commands
                            
                                How to make sqlite3.connect() fail if the db file does not exist?
                            
                                How can I configure CCSID value in my Queue Manager using "pymqi" Python library?
                            
                                Inconsistent shape between the condition and the input while using seaborn
                            
                                Best practice for feeding spark dataframes for training Tensorflow network
                            
                                Trying different functions until one does not throw an exception
                            
                                Run from and save to .py file from Jupyter Notebook
                            
                                DLL load failed: the specified module could not be found Windows 10 shell
                            
                                AWS CodeBuild Colorized logs
                            
                                How to list all function names of a Python module in C++?
                            
                                Delete all variables (classes) generated by class method python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras model training memory leak

Tags:

python

memory

tensorflow

keras

checkpoint

Sly Shark

People also ask

1 Answers

OverLordGoldDragon

Recent Activity

Donate For Us