Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loss, val_loss, acc and val_acc do not update at all over epochs

I created an LSTM network for sequence classification (binary) where each sample has 25 timesteps and 4 features. The following is my keras network topology:

enter image description here

Above, the activation layer after Dense layer uses softmax function. I used binary_crossentropy for loss function and Adam as the optimizer to compile the keras model. Trained the model with batch_size=256, shuffle=True and validation_split=0.05, The following is the training log:

Train on 618196 samples, validate on 32537 samples
2017-09-15 01:23:34.407434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-15 01:23:34.407719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.493
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.47GiB
2017-09-15 01:23:34.407735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-15 01:23:34.407757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-15 01:23:34.407764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
618196/618196 [==============================] - 139s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 2/50
618196/618196 [==============================] - 132s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 3/50
618196/618196 [==============================] - 134s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 4/50
618196/618196 [==============================] - 133s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 5/50
618196/618196 [==============================] - 132s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 6/50
618196/618196 [==============================] - 132s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 7/50
618196/618196 [==============================] - 132s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251
Epoch 8/50
618196/618196 [==============================] - 132s - loss: 4.3489 - acc: 0.7302 - val_loss: 4.4316 - val_acc: 0.7251

... and so on through 50 epochs with same numbers

So far, I have also tried using rmsprop, nadam optimizers and batch_size(s) 128, 512, 1024 but the loss, val_loss, acc, val_acc always remained same throughout all epochs, yielding accuracy in the range of 0.72 to 0.74 in my each attempt.

like image 442
Kaushik Shrestha Avatar asked Sep 15 '17 13:09

Kaushik Shrestha


People also ask

What is loss and Val_loss?

val_loss is the value of cost function for your cross-validation data and loss is the value of cost function for your training data. On validation data, neurons using drop out do not drop random neurons.

What is loss in epochs?

"loss" refers to the loss value over the training data after each epoch. This is what the optimization process is trying to minimize with the training so, the lower, the better. "accuracy" refers to the ratio between correct predictions and the total number of predictions in the training data. The higher, the better.

What does Val_acc mean?

val_acc is the accuracy computed on the validation set (data that have never been 'seen' by the model). batch size for testing is exactly the same concept as training batch size, you usually cannot load all your testing data into memorym so you ahve to use batches.

What is Val accuracy in keras?

'val_acc' refers to validation set. Note that val_acc refers to a set of samples that was not shown to the network during training and hence refers to how much your model works in general for cases outside the training set. It is common for validation accuracy to be lower than accuracy.


1 Answers

The softmax activation makes sure the sum of the outputs is 1. It's useful for assuring that only one class among many classes will be output.

Since you have only 1 output (only one class), it's certainly a bad idea. You're probably ending up with 1 as result for all samples.

Use sigmoid instead. It goes well with binary_crossentropy.

like image 124
Daniel Möller Avatar answered Oct 21 '22 07:10

Daniel Möller