I am training a CNN model using tf.keras passing training and validation generators as follows:
model.fit(
x=training_data_generator,
validation_data=validation_data_generator,
epochs=n_epochs,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
The generators are based on tf.keras.Sequence.
The problem is, my data set is huge. Training one epoch takes about a day (despite training on two Titan RTX GPUs) and validation after each epoch takes a few hours.
During training I can see the progress displayed, but during validation all I see is the last snapshot of the training progress bar:
130339/130340 [==============================] - 147432s 1s/step
until the validation finishes and finally I see my validation acuracy, loss etc.
Is there a way to display a progress bar for validation?
I'm thinking of doing something like this:
for epoch in range(n_epochs):
model.fit(
x=training_data_generator,
epochs=1,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
validation_results = model.evaluate(
x=validation_data_generator,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
print(validation_results)
Another option I was considering is to create a custom callback that validates the model on_epoch_end, but this seems very non-standard.
Is there a better approach to this?
You can set a steps_per_epoch on the fit method.
Based on the documentation:
Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors,
the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted. This argument is not supported with array inputs.
By this, you can limit the per epoch steps, so setting it with a lower value will immediately give you the validation loss & accuracy per epoch
By setting the steps_per_epoch to a lower size means you need to increase the epoch.
Every 1000 steps or epoch, it will show you the training and validation loss & accuracy after finishing 1000 steps rather than exhausting the entire dataset first then showing the results.
history = model.fit(x_train, y_train,
batch_size=2,
epochs=30,
steps_per_epoch=1000,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(x_val, y_val))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With