Keras high loss, not decreasing with each epoch

Tags:

keras

I am learning NN and Keras. My test data is something like this:

Result, HomeWinPossibility, DrawPossibility, AwayWinPossibility
[['AwayWin' 0.41 0.28 0.31]
 ['HomeWin' 0.55 0.25 0.2]
 ['AwayWin' 0.17 0.21 0.62]
 .....

Here is my model:

model = Sequential()
model.add(Dense(16, input_shape=(3,)))
model.add(Activation('sigmoid'))
model.add(Dense(8, activation='relu'))
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(train_X, train_y_ohe, epochs=100, batch_size=1, verbose=1);

The output from fit is:

Epoch 1/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9151 - acc: 0.5737
Epoch 2/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9181 - acc: 0.5474
Epoch 3/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9111 - acc: 0.5526
Epoch 100/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9130 - acc: 0.5579

So why the loss is not going down as some NN tutorials I read? Is it because the data I provided are just noises, so NN can't find any clue or something not right with my model?

As the acc is always around 0.55 (so 50%), does it mean the NN actually achieved better than random guessing (> 33%)? If this is true, why it achieved accuracy 0.57 at the first epoch?

813

asked Jun 30 '18 08:06

1 Answers

So why the loss is not going down as some NN tutorials I read?

It might be many reasons - all depending on your data. Here are things you could adjust:

You have a very low batch size. Although some data might actually respond to this, I think that a batch size of 1 would be too small in most cases - without getting started on the redundantness of the structure you show when you use batch size 1. Batch size is very dependent on how much, and what kind of, data you have, but try somewhere around 20-30 if you have sufficient data.
Try different activation functions (but always have softmax or sigmoid in the last layer because you want numbers between 0 and 1).
Increase the number of units in the first and/or second layer (if you have enough data).
Try to set the learning rate (lr) for the Adam optimizer: model.compile(optimizer=keras.optimizers.Adam(lr=0.001), ...)

Is it because the data I provided are just noises

If your data is pure noise across classes, then very probably, given that there are roughly the same number of datapoints in each class, the accuracy would be around 33%, since it would essentially just guess at random.

As the acc is always around 0.55(so 50%). does it mean the NN actually achieved better than random guessing (33%)?

Not necessarily. The accuracy is a measure of how many classes that were correctly classified. Say that the validation data (conventionally the part of the dataset that the accuracy is calculated on) only contains data from one class. Then if the NN only classifies everything to that one class, the validation data would have 100% accuracy!

That means if you don't have the same number of datapoints from each class, accuracy is not to be trusted alone! A much better measure in cases where you have unbalanced datasets is e.g. the AUC (Area under the ROC curve) or the F1 score, which takes false positives into account as well.

I would recommend that you look into the theory behind this. Just blindly running around will probably be very annoying because you'd have a very hard time getting good results. And even if you got good results, they might often not be as good as you think. One place to read would be Ian Goodfellow's book on deep learning.

190

answered Nov 03 '22 01:11

Andreas Storvik Strauman

Related questions
                            
                                Jacobian matrix computation for artificial neural networks
                            
                                What does 'Attempting to upgrade input file specified using deprecated transformation parameters' mean?
                            
                                Specify Cross Validation Folds with caret
                            
                                partial_fit Sklearn's MLPClassifier
                            
                                Reducing Filter Size in Convolutional Neural Network
                            
                                Keras - use part of the input at later stage in sequential model
                            
                                Backpropagation with Momentum
                            
                                Modify neural net to classify single example
                            
                                Getting precision, recall and F1 score per class in Keras
                            
                                Error when checking input: expected dense_input to have shape (21,) but got array with shape (1,)
                            
                                Is deep learning bad at fitting simple non linear functions outside training scope (extrapolating)?
                            
                                AttributeError: 'tuple' object has no attribute 'rank' when calling fit on a Keras model with custom generator
                            
                                Connect 4 with neural network: evaluation of draft + further steps
                            
                                _convertToOneOfMany in PyBrain
                            
                                Neural Net - Selecting Data For Each Mini Batch
                            
                                Add bias to Lasagne neural network layers
                            
                                Terminal output redirection not working for Caffe
                            
                                Text classification using Keras: How to add custom features?
                            
                                Same (?) neural network architecture in Tensorflow and Keras produces different accuracy on the same data
                            
                                Pytorch What's the difference between define layer in __init__() and directly use in forward()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras high loss, not decreasing with each epoch

Tags:

neural-network

keras

daxu

People also ask

1 Answers

Andreas Storvik Strauman

Recent Activity

Donate For Us