If I want to train a model with train_generator, is there a significant difference between choosing
and
Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this
An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of 2,000 images / (10 images / step) = 200 steps.
The right number of epochs depends on the inherent perplexity (or complexity) of your dataset. A good rule of thumb is to start with a value that is 3 times the number of columns in your data. If you find that the model is still improving after all epochs complete, try again with a higher value.
At each step, the network takes in the number of batch size samples, and the weights update constantly on the basis of mean loss. So at each step weights updates on its own. The steps per epoch simply indicate how many times the batch of the dataset has been fed to the network in each epoch.
Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me. Value for batch size should be (preferred) in powers of 2.
Based on what you said it sounds like you need a larger batch_size
, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.
To solve for jumping-around
Implications of a larger batch-size
When to reduce epochs
When to adjust steps-per-epoch
Steps per epoch does not connect to epochs.
Naturally what you want if to 1 epoch your generator pass through all of your training data one time. To achieve this you should provide steps per epoch equal to number of batches like this:
steps_per_epoch = int( np.ceil(x_train.shape[0] / batch_size) )
as from above equation the largest the batch_size
, the lower the steps_per_epoch
.
Next you will choose epoch based on chosen validation. (choose what you think best)
The Steps per epoch denote the number of batches to be selected for one epoch. If 500 steps are selected then the network will train for 500 batches to complete one epoch. If we select the large number of epochs it can be computational
steps_per_epoch
tells the network how many batches to include in an epoch.
By definition, an epoch
is considered complete when the dataset has been run through the model once in its entirety. With other words, it means that all training samples have been run through the model. (For further discussion, let us assume that the size of the training examples is 'm').
Also by definition, we know that `batch size' is between [1, m].
Below is what TensorFlow page says about steps_per_epoch
If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.
Now suppose that your training_size, m = 128
and batch_size, b = 16
, which means that your data is grouped into 8 batches. According to the above quote, the maximum value you can assign to steps_per_epoch
is 8, as computed in one of the answers by @Ioannis Nasios.
However, it is not necessary that you set the value to 8 only (as in our example). You can choose any value between 1 and 8. You just need to be aware that the training will be performed only with this number of batches.
The reason for the jumpy error values could be the size of your batch, as correctly mentioned in this answer by @Chris Farr.
Training & evaluation from tf.data Datasets
If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).
The advantage of a low value for steps_per_epoch
is that different epochs are trained with different data sets (a kind of regularization). However, if you have a limited training size, using only a subset of stacks would not be what we want. It is a decision one has to make.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With