Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validation and training accuracy high in the first epoch [Keras]

I am training an image classifier with 2 classes and 53k images, and validating it with 1.3k images using keras. Here is the structure of the neural network :

model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
              loss='binary_crossentropy', metrics=['accuracy'])

Training accuracy increases from ~50% to ~85% in the first epoch, with 85% validation accuracy. Subsequent epochs increase the training accuracy consistently, however, validation accuracy stays in the 80-90% region.

I'm curious, is it possible to get high validation and training accuracy in the first epoch? If my understanding is correct, it starts small and increases steadily with each passing epoch.

Thanks

EDIT : The image size is 150x150 after rescaling and the mini-batch size is 16.

like image 494
Swapnil Dhanwal Avatar asked Jan 02 '23 09:01

Swapnil Dhanwal


2 Answers

Yes, it is entirely possible to get high accuracy on first epoch and then only modest improvements.

If there is enough redundancy in the data and you make enough updates (wrt. the complexity of your model, which seems fairly easy to optimize) in the first epoch (i.e. you use small minibatches), it's entirely possible that you learn most of the important stuff during the first epoch. When you show the data again, the model will start overfitting to pecularities introduced by the specific images in your train set (thus you get increasing training accuracy), but since you do not provide any novel samples, it will not learn anything new about the underlying properties of your classes.

You can think of your training data as an infinite stream (which actually SGD would like to enjoy all the convergence theorems). Do you think that you need more than 50k samples to learn what is important? You can actually test the data-hunger of your model by providing less data or reporting performance after a some sub-epoch number of updates.

like image 182
dedObed Avatar answered Jan 05 '23 17:01

dedObed


You cannot expect to get an accuracy over 90-95% with image classification using feed forward neural networks.

You need to use another architecture called Convolution Neural network, state of the art in image recognition.

Also it is very easy to build that using keras, but it is computationally more intensive than this.

If you want to stick with feed forward layers the best thing you can do is early stopping, but even that wouldn't give you accuracy over 90%.

like image 40
coder3101 Avatar answered Jan 05 '23 19:01

coder3101