Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing number of Steps per Epoch

If I want to train a model with train_generator, is there a significant difference between choosing

  • 10 Epochs with 500 Steps each

and

  • 100 Epochs with 50 Steps each

Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this

like image 296
n.st Avatar asked Apr 19 '18 13:04

n.st


People also ask

How do you choose steps per epoch?

An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of 2,000 images / (10 images / step) = 200 steps.

How do you choose optimal number of epochs?

The right number of epochs depends on the inherent perplexity (or complexity) of your dataset. A good rule of thumb is to start with a value that is 3 times the number of columns in your data. If you find that the model is still improving after all epochs complete, try again with a higher value.

What should be the steps per epoch in Keras?

At each step, the network takes in the number of batch size samples, and the weights update constantly on the basis of mean loss. So at each step weights updates on its own. The steps per epoch simply indicate how many times the batch of the dataset has been fed to the network in each epoch.

How do you choose optimal batch size and epochs?

Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me. Value for batch size should be (preferred) in powers of 2.


4 Answers

Based on what you said it sounds like you need a larger batch_size, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.

To solve for jumping-around

  • A larger batch size will give you a better gradient and will help to prevent jumping around
  • You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains

Implications of a larger batch-size

  • Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with.
  • Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function.

When to reduce epochs

  • If your train error is very low, yet your test/validation is very high, then you have over-fit the model with too many epochs.
  • The best way to find the right balance is to use early-stopping with a validation test set. Here you can specify when to stop training, and save the weights for the network that gives you the best validation loss. (I highly recommend using this always)

When to adjust steps-per-epoch

  • Traditionally, the steps per epoch is calculated as train_length // batch_size, since this will use all of the data points, one batch size worth at a time.
  • If you are augmenting the data, then you can stretch this a tad (sometimes I multiply that function above by 2 or 3 etc. But, if it's already training for too long, then I would just stick with the traditional approach.
like image 114
Chris Farr Avatar answered Oct 16 '22 20:10

Chris Farr


Steps per epoch does not connect to epochs.

Naturally what you want if to 1 epoch your generator pass through all of your training data one time. To achieve this you should provide steps per epoch equal to number of batches like this:

steps_per_epoch = int( np.ceil(x_train.shape[0] / batch_size) )

as from above equation the largest the batch_size, the lower the steps_per_epoch.

Next you will choose epoch based on chosen validation. (choose what you think best)

like image 23
Ioannis Nasios Avatar answered Oct 16 '22 20:10

Ioannis Nasios


The Steps per epoch denote the number of batches to be selected for one epoch. If 500 steps are selected then the network will train for 500 batches to complete one epoch. If we select the large number of epochs it can be computational

like image 1
Manish Vasandnani Avatar answered Oct 16 '22 19:10

Manish Vasandnani


steps_per_epoch tells the network how many batches to include in an epoch.

By definition, an epoch is considered complete when the dataset has been run through the model once in its entirety. With other words, it means that all training samples have been run through the model. (For further discussion, let us assume that the size of the training examples is 'm').

Also by definition, we know that `batch size' is between [1, m].

Below is what TensorFlow page says about steps_per_epoch

If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.

Now suppose that your training_size, m = 128 and batch_size, b = 16, which means that your data is grouped into 8 batches. According to the above quote, the maximum value you can assign to steps_per_epoch is 8, as computed in one of the answers by @Ioannis Nasios.

However, it is not necessary that you set the value to 8 only (as in our example). You can choose any value between 1 and 8. You just need to be aware that the training will be performed only with this number of batches.

The reason for the jumpy error values could be the size of your batch, as correctly mentioned in this answer by @Chris Farr.

Training & evaluation from tf.data Datasets

If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).

The advantage of a low value for steps_per_epoch is that different epochs are trained with different data sets (a kind of regularization). However, if you have a limited training size, using only a subset of stacks would not be what we want. It is a decision one has to make.

like image 1
Harsha Y Avatar answered Oct 16 '22 19:10

Harsha Y