Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras predict loop memory leak using tf.data.Dataset but not with a numpy array

I encounter a memory leak and decreasing performance when looping over a Keras model predict function when using a tf.data.Dataset to feed the model, but not when feeding it with a numpy array.

Does anyone understand what is causing this and/or how to resolve the issue?

Minimal reproducible code snippet (copy/paste runnable):

import tensorflow as tf
import numpy as np
import time

SIZE = 5000

inp = tf.keras.layers.Input(shape=(SIZE,), dtype='float32')
x = tf.keras.layers.Dense(units=SIZE)(inp)

model = tf.keras.Model(inputs=inp, outputs=x)

np_data = np.random.rand(1, SIZE)
ds = tf.data.Dataset.from_tensor_slices(np_data).batch(1).repeat()

debug_time = time.time()
while True:
    model.predict(x=ds, steps=1)
    print('Processing {:.2f}'.format(time.time() - debug_time))
    debug_time = time.time()

Result: Predict loop timing starts around 0.04s per iteration, within a minute or two it's up to about 0.5s and process memory continues to increase from a few hundred MB to close to a GB.


Swap out the tf.data.Dataset for an equivalent numpy array and runtime is ~0.01s consistently.

Working case code snippet (copy/paste runnable):

import tensorflow as tf
import numpy as np
import time

SIZE = 5000

inp = tf.keras.layers.Input(shape=(SIZE,), dtype='float32')
x = tf.keras.layers.Dense(units=SIZE)(inp)

model = tf.keras.Model(inputs=inp, outputs=x)

np_data = np.random.rand(1, SIZE)

debug_time = time.time()
while True:
    model.predict(x=np_data)  # using numpy array directly
    print('Processing {:.2f}'.format(time.time() - debug_time))
    debug_time = time.time()

Related discussions:

  • Memory leak tf.data + Keras - Doesn't seem to address the core issue, but the question appears similar.
  • https://github.com/tensorflow/tensorflow/issues/22098 - Possibly an open issue in Keras/Github, but I can't confirm it, changing inter_op_paralellism as suggested in that thread has no impact on the results posted here.

Additional info:

  • I can reduce the rate of performance degradation by around 10x by passing in an iterator instead of a dataset object. I noticed in training_utils.py:1314 the Keras code is creating an iterator each call to predict.

TF 1.14.0

like image 781
David Parks Avatar asked Jul 06 '19 03:07

David Parks


1 Answers

The root of the problem appears to be that Keras is creating dataset operations each predict loop. Notice at training_utils.py:1314 a dataset iterator is created in each predict loop.

The problem can be reduced in severity by passing in an iterator, and is solved entirely by passing in the iterators get_next() tensor.

I have posted the issue on the Tensorflow Github page: https://github.com/tensorflow/tensorflow/issues/30448

Here is the solution, this example runs in constant time using the TF dataset, you just can't pass in the dataset object:

import tensorflow as tf
import numpy as np
import time

SIZE = 5000

inp = tf.keras.layers.Input(shape=(SIZE,), dtype='float32')
x = tf.keras.layers.Dense(units=SIZE)(inp)

model = tf.keras.Model(inputs=inp, outputs=x)

np_data = np.random.rand(1, SIZE)
ds = tf.data.Dataset.from_tensor_slices(np_data).batch(1).repeat()
it = tf.data.make_one_shot_iterator(ds)
tensor = it.get_next()

debug_time = time.time()
while True:
    model.predict(x=tensor, steps=1)
    print('Processing {:.2f}'.format(time.time() - debug_time))
    debug_time = time.time()
like image 78
David Parks Avatar answered Sep 25 '22 07:09

David Parks