Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

First training epoch is very slow

Hi… I’m running mnist code in my P3 AWS machine and the initialization process seems to be very long compared to my previous P2 machine (although P3>P2)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 265s 4ms/step - loss: 0.2674 - acc: 0.9175 - val_loss: 0.0602 - val_acc: 0.9811
Epoch 2/10
60000/60000 [==============================] - 3s 51us/step - loss: 0.0860 - acc: 0.9742 - val_loss: 0.0393 - val_acc: 0.9866
Epoch 3/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0647 - acc: 0.9808 - val_loss: 0.0338 - val_acc: 0.9884
Epoch 4/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0542 - acc: 0.9839 - val_loss: 0.0337 - val_acc: 0.9887
Epoch 5/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0453 - acc: 0.9863 - val_loss: 0.0311 - val_acc: 0.9900
Epoch 6/10
60000/60000 [==============================] - 3s 51us/step - loss: 0.0412 - acc: 0.9873 - val_loss: 0.0291 - val_acc: 0.9898
Epoch 7/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0368 - acc: 0.9891 - val_loss: 0.0300 - val_acc: 0.9901
Epoch 8/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0340 - acc: 0.9897 - val_loss: 0.0298 - val_acc: 0.9897
Epoch 9/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0320 - acc: 0.9908 - val_loss: 0.0267 - val_acc: 0.9916
Epoch 10/10
60000/60000 [==============================] - 3s 50us/step - loss: 0.0286 - acc: 0.9914 - val_loss: 0.0276 - val_acc: 0.9903
Test loss: 0.02757222411266339
Test accuracy: 0.9903

I’m using Keras=2.1.4 tensorflow-gpu=1.5.0

my keras.json file is configured as follows:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

Any ideas why is it like that?

Thanks in advance

like image 552
Jenia Golbstein Avatar asked Feb 27 '18 09:02

Jenia Golbstein


1 Answers

Based on this issue:

The first epoch takes the same time, but the counter also takes into account the time taken by building the part of the computational graph that deals with training (a few seconds). This used to be done during the compile step, but now it is done lazily one demand to avoid unnecessary work.

like image 195
Ali Abbasi Avatar answered Oct 05 '22 12:10

Ali Abbasi