I am learning NN and Keras. My test data is something like this:
Result, HomeWinPossibility, DrawPossibility, AwayWinPossibility
[['AwayWin' 0.41 0.28 0.31]
['HomeWin' 0.55 0.25 0.2]
['AwayWin' 0.17 0.21 0.62]
.....
Here is my model:
model = Sequential()
model.add(Dense(16, input_shape=(3,)))
model.add(Activation('sigmoid'))
model.add(Dense(8, activation='relu'))
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(train_X, train_y_ohe, epochs=100, batch_size=1, verbose=1);
The output from fit is:
Epoch 1/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9151 - acc: 0.5737
Epoch 2/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9181 - acc: 0.5474
Epoch 3/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9111 - acc: 0.5526
Epoch 100/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9130 - acc: 0.5579
So why the loss is not going down as some NN tutorials I read? Is it because the data I provided are just noises, so NN can't find any clue or something not right with my model?
As the acc
is always around 0.55 (so 50%), does it mean the NN actually achieved better than random guessing (> 33%)? If this is true, why it achieved accuracy 0.57 at the first epoch?
The fewer the number of parameters, the simpler the shape of the function our network will be able to approximate. So, rule 1: if the training loss is not decreasing, chances are the model is too simple for the data.
At times, the validation loss is greater than the training loss. This may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.
It is likely that your model is overfitting to the data, especially given the size of your dataset compared to the size of your data. There is most likely nothing wrong with your code, but instead you should be using either a smaller model or increase the size of your dataset, or more likely both.
The right number of epochs depends on the inherent perplexity (or complexity) of your dataset. A good rule of thumb is to start with a value that is 3 times the number of columns in your data. If you find that the model is still improving after all epochs complete, try again with a higher value.
So why the loss is not going down as some NN tutorials I read?
It might be many reasons - all depending on your data. Here are things you could adjust:
You have a very low batch size. Although some data might actually respond to this, I think that a batch size of 1
would be too small in most cases - without getting started on the redundantness of the structure you show when you use batch size 1. Batch size is very dependent on how much, and what kind of, data you have, but try somewhere around 20-30 if you have sufficient data.
Try different activation functions (but always have softmax
or sigmoid
in the last layer because you want numbers between 0
and 1
).
Increase the number of units in the first and/or second layer (if you have enough data).
Try to set the learning rate (lr
) for the Adam optimizer: model.compile(optimizer=keras.optimizers.Adam(lr=0.001), ...)
Is it because the data I provided are just noises
If your data is pure noise across classes, then very probably, given that there are roughly the same number of datapoints in each class, the accuracy would be around 33%, since it would essentially just guess at random.
As the acc is always around 0.55(so 50%). does it mean the NN actually achieved better than random guessing (33%)?
Not necessarily. The accuracy is a measure of how many classes that were correctly classified. Say that the validation data (conventionally the part of the dataset that the accuracy is calculated on) only contains data from one class. Then if the NN only classifies everything to that one class, the validation data would have 100% accuracy!
That means if you don't have the same number of datapoints from each class, accuracy is not to be trusted alone! A much better measure in cases where you have unbalanced datasets is e.g. the AUC (Area under the ROC curve) or the F1 score, which takes false positives into account as well.
I would recommend that you look into the theory behind this. Just blindly running around will probably be very annoying because you'd have a very hard time getting good results. And even if you got good results, they might often not be as good as you think. One place to read would be Ian Goodfellow's book on deep learning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With