I'm trying to decide whether to use the existing keras.utils.sequence module or to switch to tf.data. From what I understand, tf.data optimizes performance by overlapping training on GPU with pre-processing on the CPU. But how does that compare to keras.utils.sequence and the keras data generator? From what I read here it seems that it's doing the same thing. Is there anything to gain by switching to tf.data ?
Both approaches overlap input data preprocessing with model training. keras.utils.sequence
does this by running multiple Python processes, while tf.data does this by running multiple C++ threads.
If your preprocessing is being done by a non-TensorFlow Python library such as PIL, keras.utils.sequence
may work better for you since multiple processes are needed to avoid contention on Python's global interpreter lock.
If you can express your preprocessing using TensorFlow operations, I would expect tf.data
to give better performance.
Some other things to consider:
tf.data
is the recommended approach for building scalable input pipelines for tf.keras
tf.data
is used more widely than keras.utils.sequence
, so it may be easier to search for help with getting good performance.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With