Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor validation loss in the training of estimators in TensorFlow?

I want to ask a question about how to monitor validation loss in the training process of estimators in TensorFlow. I have checked a similar question (validation during training of Estimator) asked before, but it did not help much.

If I use estimators to build a model, I will give an input function to the Estimator.train() function. But there is no way to add another validation_x, and validation_y data in the training process. Therefore, when the training started, I can only see the training loss. The training loss is expected to decrease when the training process running longer. However, this information is not helpful to prevent overfitting. The more valuable information is validation loss. Usually, the validation loss is the U-shape with the number of epochs. To prevent overfitting, we want to find the number of epochs that the validation loss is minimum.

So this is my problem. How can I get validation loss for each epoch in the training process of using estimators?

like image 490
Han M Avatar asked Nov 09 '18 20:11

Han M


People also ask

How does TensorFlow data validation construct a schema?

Instead of constructing a schema manually from scratch, a developer can rely on TensorFlow Data Validation's automatic schema construction. Specifically, TensorFlow Data Validation automatically constructs an initial schema based on statistics computed over training data available in the pipeline.

What is the TensorFlow data API?

The tf.data API is a set of utilities in TensorFlow 2.0 for loading and preprocessing data in a way that's fast and scalable. For a complete guide about creating Datasets, see the tf.data documentation. You can pass a Dataset instance directly to the methods fit (), evaluate (), and predict ():

What is the earlystopping callback in TensorFlow?

If a metric doesn’t change by a minimum delta in a given number of epochs, the EarlyStopping callback kills the training process. For example, if validation accuracy doesn’t increase at least 0.001 in 10 epochs, this callback tells TensorFlow to stop the training. There’s not much to it — it’s simple but extremely useful.

What is distribution skew in TensorFlow data validation?

TensorFlow Data Validation can detect distribution skew between training and serving data. Distribution skew occurs when the distribution of feature values for training data is significantly different from serving data.


1 Answers

You need to create a validation input_fn and either use estimator.train() and estimator.evaluate() alternatively or simpy use tf.estimator.train_and_evaluate()

x = ...
y = ...

...

# For example, if x and y are numpy arrays < 2 GB
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
val_dataset = tf.data.Dataset.from_tensor_slices((x_val_, y_val))

...

estimator = ...

for epoch in n_epochs:
    estimator.train(input_fn = train_dataset)
    estimator.evaluate(input_fn = val_dataset)

estimator.evaluate() will compute the loss and any other metrics that are defined in your model_fn and will save the events in a new "eval" directory inside your job_dir.

like image 184
Olivier Dehaene Avatar answered Oct 11 '22 22:10

Olivier Dehaene