<p>I'm writing some code to optimize a neural net architecture and so have a python function <code>create_nn(parms)</code> that creates and initializes a keras model. However, the problem I'm having is that after a fewer iterations the models take a lot longer to train than usual (initally one epoch takes 10sec, and then after roughly the 14th model (each model trains for 20 epochs) it takes 60sec/epoch). I know that this is not because of the evolving architecture because if I restart the script and start were it ended, it is back to normal speeds. </p> <p>I'm currently running </p> <pre class="prettyprint"><code>from keras import backend as K </code></pre> <p>and then a </p> <pre class="prettyprint"><code>K.clear_session() </code></pre> <p>after training any given new model.</p> <p><strong>Some additional details:</strong></p> <ul> <li><p>For the first 12 models, training time per epoch remains roughly constant at 10sec/epoch. Then at the 13th model training time per epoch climbs steadily to 60sec. Then training time per epoch hovers at around 60sec/epoch.</p></li> <li><p>I'm running keras with Tensorflow as the backend </p></li> <li><p>I'm using an Amazon EC2 t2.xlarge instance</p></li> <li><p>There is plenty of free RAM (7GB free, w/ the dataset of size 5GB)</p></li> </ul> <p>I've removed a bunch of layers and parameters, but essentially <code>create_nn</code> looks like:</p> <pre class="prettyprint"><code>def create_nn(features, timesteps, number_of_filters): inputs = Input(shape = (timesteps, features)) x = GaussianNoise(stddev=0.005)(inputs) #Layer 1.1 x = Convolution1D(number_of_filters, 3, padding='valid')(x) x = Activation('relu')(x) x = Flatten()(x) x = Dense(10)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Dropout(0.5)(x) # Output layer outputs = Dense(1, activation='sigmoid')(x) model = Model(inputs=inputs, outputs=outputs) # Compile and Return model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) print('CNN model built succesfully.') return model </code></pre> <p>Note that while a <code>Sequential</code> model would've worked in this dummy example, the functional API is required for the actual usecase.</p> <p>How can I fix this problem?</p>

<h3>Why is my training time increasing after every run?</h3> <p>Short answer : you need to use <code>tf.keras.backend.clear_session()</code> before every new model that you create.</p> <p><em>This problem only seems to happen when eager execution is turned off.</em></p> <p>Okay, so let's run an experiment with and without clear_session. The code for <code>make_model</code> is at the end of this response.</p> <p>First, let's look at the training time when using clear session. We'll run this experiment 10 times an print the results</p> <h3>Use tf.keras.backend.clear_session()</h3> <pre class="prettyprint"><code>non_seq_time = [ make_model(clear_session=True) for _ in range(10)] </code></pre> <h3>With clear_session=True</h3> <pre class="prettyprint"><code>non sequential Elapse = 1.06039 Elapse = 1.20795 Elapse = 1.04357 Elapse = 1.03374 Elapse = 1.02445 Elapse = 1.00673 Elapse = 1.01712 Elapse = 1.021 Elapse = 1.17026 Elapse = 1.04961 </code></pre> <p>As you can see, the training time stays about constant</p> <p>Now let's re-run the experiment without using clear session and review the training time</p> <h3>Don't use tf.keras.backend.clear_session()</h3> <pre class="prettyprint lang-py prettyprint-override"><code>non_seq_time = [ make_model(clear_session=False) for _ in range(10)] </code></pre> <h3>With clear_session=False</h3> <pre class="prettyprint"><code>non sequential Elapse = 1.10954 Elapse = 1.13042 Elapse = 1.12863 Elapse = 1.1772 Elapse = 1.2013 Elapse = 1.31054 Elapse = 1.27734 Elapse = 1.32465 Elapse = 1.32387 Elapse = 1.33252 </code></pre> <p>as you can see, the training time increases without clear_session</p> <h3>Full Code Example</h3> <pre class="prettyprint lang-py prettyprint-override"><code># Training time increases - and how to fix it # Setup and imports # %tensorflow_version 2.x import tensorflow as tf import tensorflow.keras.layers as layers import tensorflow.keras.models as models from time import time # if you comment this out, the problem doesn't happen # it only happens when eager execution is disabled !! tf.compat.v1.disable_eager_execution() (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() # Let's build that network def make_model(activation="relu", hidden=2, units=100, clear_session=False): # ----------------------------------- # . HERE WE CAN TOGGLE CLEAR SESSION # ----------------------------------- if clear_session: tf.keras.backend.clear_session() start = time() inputs = layers.Input(shape=[784]) x = inputs for num in range(hidden) : x = layers.Dense(units=units, activation=activation)(x) outputs = layers.Dense(units=10, activation="softmax")(x) model = tf.keras.Model(inputs=inputs, outputs=outputs) model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) results = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=200, verbose=0) elapse = time()-start print(f"Elapse = {elapse:8.6}") return elapse # Let's try it out and time it # prime it first make_model() print("Use clear session") non_seq_time = [ make_model(clear_session=True) for _ in range(10)] print("Don't use clear session") non_seq_time = [ make_model(clear_session=False) for _ in range(10)] </code></pre>

Training of keras model get's slower after each repetition

Tags:

python

tensorflow

keras

I'm writing some code to optimize a neural net architecture and so have a python function create_nn(parms) that creates and initializes a keras model. However, the problem I'm having is that after a fewer iterations the models take a lot longer to train than usual (initally one epoch takes 10sec, and then after roughly the 14th model (each model trains for 20 epochs) it takes 60sec/epoch). I know that this is not because of the evolving architecture because if I restart the script and start were it ended, it is back to normal speeds.

I'm currently running

from keras import backend as K

and then a

K.clear_session()

after training any given new model.

Some additional details:

For the first 12 models, training time per epoch remains roughly constant at 10sec/epoch. Then at the 13th model training time per epoch climbs steadily to 60sec. Then training time per epoch hovers at around 60sec/epoch.
I'm running keras with Tensorflow as the backend
I'm using an Amazon EC2 t2.xlarge instance
There is plenty of free RAM (7GB free, w/ the dataset of size 5GB)

I've removed a bunch of layers and parameters, but essentially create_nn looks like:

def create_nn(features, timesteps, number_of_filters):
    inputs = Input(shape = (timesteps, features))
    x = GaussianNoise(stddev=0.005)(inputs)
    #Layer 1.1
    x = Convolution1D(number_of_filters, 3, padding='valid')(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = Dense(10)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.5)(x)
    # Output layer
    outputs = Dense(1, activation='sigmoid')(x)
    model = Model(inputs=inputs, outputs=outputs)

    # Compile and Return
    model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    print('CNN model built succesfully.')
    return model

Note that while a Sequential model would've worked in this dummy example, the functional API is required for the actual usecase.

How can I fix this problem?

950

asked Aug 21 '17 11:08

E.O.

1 Answers

Why is my training time increasing after every run?

Short answer : you need to use tf.keras.backend.clear_session() before every new model that you create.

This problem only seems to happen when eager execution is turned off.

Okay, so let's run an experiment with and without clear_session. The code for make_model is at the end of this response.

First, let's look at the training time when using clear session. We'll run this experiment 10 times an print the results

Use tf.keras.backend.clear_session()

non_seq_time = [ make_model(clear_session=True) for _ in range(10)]

With clear_session=True

non sequential
Elapse =  1.06039
Elapse =  1.20795
Elapse =  1.04357
Elapse =  1.03374
Elapse =  1.02445
Elapse =  1.00673
Elapse =  1.01712
Elapse =    1.021
Elapse =  1.17026
Elapse =  1.04961

As you can see, the training time stays about constant

Now let's re-run the experiment without using clear session and review the training time

Don't use tf.keras.backend.clear_session()

non_seq_time = [ make_model(clear_session=False) for _ in range(10)]

With clear_session=False

non sequential
Elapse =  1.10954
Elapse =  1.13042
Elapse =  1.12863
Elapse =   1.1772
Elapse =   1.2013
Elapse =  1.31054
Elapse =  1.27734
Elapse =  1.32465
Elapse =  1.32387
Elapse =  1.33252

as you can see, the training time increases without clear_session

Full Code Example

# Training time increases - and how to fix it

# Setup and imports

# %tensorflow_version 2.x

import tensorflow as tf
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
from time import time

# if you comment this out, the problem doesn't happen
# it only happens when eager execution is disabled !!
tf.compat.v1.disable_eager_execution()


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()


# Let's build that network
def make_model(activation="relu", hidden=2, units=100, clear_session=False):
    # -----------------------------------
    # .     HERE WE CAN TOGGLE CLEAR SESSION
    # -----------------------------------
    if clear_session:
        tf.keras.backend.clear_session()

    start = time()
    inputs = layers.Input(shape=[784])
    x = inputs

    for num in range(hidden) :
        x = layers.Dense(units=units, activation=activation)(x)

    outputs = layers.Dense(units=10, activation="softmax")(x)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    results = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=200, verbose=0)
    elapse = time()-start
    print(f"Elapse = {elapse:8.6}")
    return elapse

# Let's try it out and time it

# prime it first
make_model()

print("Use clear session")
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]

print("Don't use clear session")
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]

answered Oct 17 '22 06:10

Anton Codes

Related questions
                            
                                Why is Celery not shutting down cleanly?
                            
                                Get the string that is the midpoint between two other strings
                            
                                Running Heroku Cedar Locally using Vagrant
                            
                                requests response.iter_content() gets incomplete file ( 1024MB instead of 1.5GB )?
                            
                                Fabric: How can I unit test my fabfile?
                            
                                color matplotlib map using bicubic interpolation
                            
                                Python Pandas Merge Causing Memory Overflow
                            
                                Pytest KeyError when attempting to access a command line variable
                            
                                Is there any official way to get the admin options of a model?
                            
                                Scipy sparse invert or spsolve lead to UMFPACK_ERROR_OUT_OF_MEMORY
                            
                                Get ordered list of attributes of a Python module
                            
                                Difference in sequence of query generated in Django and Postgres for select_for_update
                            
                                Sending OpenCV output to VLC stream
                            
                                Pandas Design Considerations for MultiIndexed Dataframes
                            
                                setdefault vs defaultdict performance
                            
                                Can I append to a compressed stream with pandas?
                            
                                How to install packages/modules in IronPython
                            
                                PYTHONPATH order on Ubuntu 14.04
                            
                                PyTest-Django Failing on missing django_migration table
                            
                                Does Python have an equivalent to Haskell's 'mask' or 'bracket' functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With