Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Neural Networks: accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. Why?

I am training a neural network in batches with Keras 2.0 package for Python. Below is some information about the data and the training parameters:

  • #samples in train: 414934
  • #features: 590093
  • #classes: 2 (binary classification problem)
  • batch size: 1024
  • #batches = 406 (414934 / 1024 = 405.2)

Below are some logs of the follow code:

for i in range(epochs):
    print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
    model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
                        steps_per_epoch=num_of_batches,
                        epochs=1,
                        verbose=1)

(partial) Logs:

train_model:: starting epoch 1/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996         
  2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587         
  3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279         
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917            
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917            
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918              
train_model:: starting epoch 2/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424         
  2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419         
  3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408         
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329            
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329            
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329              
train_model:: starting epoch 3/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756         
  2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688         
  3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691         
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745            
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745            
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745    

My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.

I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.

I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.

Also, from Keras documentation:

Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.

Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?

like image 339
Mockingbird Avatar asked May 23 '17 09:05

Mockingbird


People also ask

Does more epochs increase accuracy?

Increase Epochs Increasing epochs makes sense only if you have a lot of data in your dataset. However, your model will eventually reach a point where increasing epochs will not improve accuracy. At this point, you should consider playing around with your model's learning rate.

How does epoch affect accuracy?

Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting.

What is the process of improving the accuracy of a neural network?

The process of improving the accuracy of a neural network is called microstructure.

What is the difference between a batch and an epoch in a neural network?

What Is the Difference Between Batch and Epoch? The batch size is a number of samples processed before the model is updated. The number of epochs is the number of complete passes through the training dataset.


2 Answers

This has nothing to do with your model or your dataset; the reason for this "jump" lies in how metrics are calculated and displayed in Keras.

As Keras processes batch after batch, it saves accuracies at each one of them, and what it displays to you is not the accuracy on the latest processed batch, but the average over all batches in the current epoch. And, as the model is being trained, accuracies over successive batches tend to improve.

Now consider: in the first epoch, let's say, there are 50 batches, and network went from 0% to 90% during these 50 batches. Then at the end of the epoch Keras will show accuracy of, e.g. (0 + 0.1 + 0.5 + ... + 90) / 50%, which is, obviously, much less than 90%! But, because your actual accuracy is 90%, the first batch of the second epoch will show 90%, giving the impression of a sudden "jump" in quality. The same, obviously, goes for loss or any other metric.

Now, if you want more realistic and trustworthy calculation of accuracy, loss, or any other metric you may find yourself using, I would suggest using validation_data parameter in model.fit[_generator] to provide validation data, which will not be used for training, but will be used only to evaluate the network at the end of each epoch, without averaging over various points in time.

like image 137
Akiiino Avatar answered Oct 11 '22 14:10

Akiiino


The accuracy at the end of an epoch is the accuracy over the full dataset. The accuracy after each batch is the accuracy over all batches that are used for training at that moment. It could be the case that your first batch is predicted very well and the following batches have a lower accuracy. In that case the accuracy over your full dataset will be low compared to the accuracy of your first batch.

like image 38
Wilmar van Ommeren Avatar answered Oct 11 '22 14:10

Wilmar van Ommeren