I am really confused about the documentation of tensorflow estimator tf.estimator.inputs.numpy_input_fn
here, and specifically about the line on num_epochs
:
num_epochs: Integer, number of epochs to iterate over data. If None will run forever.
If I set num_epochs
to None
, the training would run forever??
What does it even mean for it to run forever??
It doesn't make sense to me since I cannot imagine people would design the program in such a way that it might run forever.
Could someone explain?
ANSWER MY OWN Question: I think I've found the answer in here: https://www.tensorflow.org/versions/r1.3/get_started/input_fn#evaluating_the_model
Specifically, in the part Building the input_fn
:
Two additional arguments are provided: num_epochs: controls the number of epochs to iterate over data. For training, set this to None, so the
input_fn keeps returning data until the required number of train steps is reached. For evaluate and predict, set this to 1, so the input_fn will iterate over the data once and then raise OutOfRangeError. That error will signal the Estimator to stop evaluate or predict.
num_epochs: the maximum number of epochs (seeing each data point). steps: the number of updates (of parameters). You can update multiple times in an epoch when the batch size is smaller than the number of training data.
Estimators provide a safe distributed training loop that controls how and when to: Load data. Handle exceptions. Create checkpoint files and recover from failures. Save summaries for TensorBoard.
The “train_op” and the scalar loss tensor are the minimum required arguments to create an “EstimatorSpec” for training.
It is recommended using pre-made Estimators when just getting started. To write a TensorFlow program based on pre-made Estimators, you must perform the following tasks: Create one or more input functions. Define the model's feature columns.
If num_epochs
is None
, your code will iterate over the dataset infinitely. It will run forever, allowing you to manually stop training whenever you want. You could, for example, manually monitor your training and testing losses (and/or any other metrics), to stop training when your model converges or you start overfitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With