I'm writing some code to optimize a neural net architecture and so have a python function create_nn(parms)
that creates and initializes a keras model.
However, the problem I'm having is that after a fewer iterations the models take a lot longer to train than usual (initally one epoch takes 10sec, and then after roughly the 14th model (each model trains for 20 epochs) it takes 60sec/epoch).
I know that this is not because of the evolving architecture because if I restart the script and start were it ended, it is back to normal speeds.
I'm currently running
from keras import backend as K
and then a
K.clear_session()
after training any given new model.
Some additional details:
For the first 12 models, training time per epoch remains roughly constant at 10sec/epoch. Then at the 13th model training time per epoch climbs steadily to 60sec. Then training time per epoch hovers at around 60sec/epoch.
I'm running keras with Tensorflow as the backend
I'm using an Amazon EC2 t2.xlarge instance
There is plenty of free RAM (7GB free, w/ the dataset of size 5GB)
I've removed a bunch of layers and parameters, but essentially create_nn
looks like:
def create_nn(features, timesteps, number_of_filters):
inputs = Input(shape = (timesteps, features))
x = GaussianNoise(stddev=0.005)(inputs)
#Layer 1.1
x = Convolution1D(number_of_filters, 3, padding='valid')(x)
x = Activation('relu')(x)
x = Flatten()(x)
x = Dense(10)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
# Output layer
outputs = Dense(1, activation='sigmoid')(x)
model = Model(inputs=inputs, outputs=outputs)
# Compile and Return
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
print('CNN model built succesfully.')
return model
Note that while a Sequential
model would've worked in this dummy example, the functional API is required for the actual usecase.
How can I fix this problem?
The difference in timings is due to the data being generated and loaded into memory before and during the first epoch.
Why is my training loss much higher than my testing loss? A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time. They are reflected in the training time loss but not in the test time loss.
This means that Keras is slower and lower in performance when compared to TensorFlow. However, Keras is more popular in terms of popularity, while TensorFlow is the second most popular. Keras is written most heavily in Python.
Short answer : you need to use tf.keras.backend.clear_session()
before every new model that you create.
This problem only seems to happen when eager execution is turned off.
Okay, so let's run an experiment with and without clear_session. The code for make_model
is at the end of this response.
First, let's look at the training time when using clear session. We'll run this experiment 10 times an print the results
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]
non sequential
Elapse = 1.06039
Elapse = 1.20795
Elapse = 1.04357
Elapse = 1.03374
Elapse = 1.02445
Elapse = 1.00673
Elapse = 1.01712
Elapse = 1.021
Elapse = 1.17026
Elapse = 1.04961
As you can see, the training time stays about constant
Now let's re-run the experiment without using clear session and review the training time
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]
non sequential
Elapse = 1.10954
Elapse = 1.13042
Elapse = 1.12863
Elapse = 1.1772
Elapse = 1.2013
Elapse = 1.31054
Elapse = 1.27734
Elapse = 1.32465
Elapse = 1.32387
Elapse = 1.33252
as you can see, the training time increases without clear_session
# Training time increases - and how to fix it
# Setup and imports
# %tensorflow_version 2.x
import tensorflow as tf
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
from time import time
# if you comment this out, the problem doesn't happen
# it only happens when eager execution is disabled !!
tf.compat.v1.disable_eager_execution()
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Let's build that network
def make_model(activation="relu", hidden=2, units=100, clear_session=False):
# -----------------------------------
# . HERE WE CAN TOGGLE CLEAR SESSION
# -----------------------------------
if clear_session:
tf.keras.backend.clear_session()
start = time()
inputs = layers.Input(shape=[784])
x = inputs
for num in range(hidden) :
x = layers.Dense(units=units, activation=activation)(x)
outputs = layers.Dense(units=10, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
results = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=200, verbose=0)
elapse = time()-start
print(f"Elapse = {elapse:8.6}")
return elapse
# Let's try it out and time it
# prime it first
make_model()
print("Use clear session")
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]
print("Don't use clear session")
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With