I am training a normal feed-forward network on financial data of the last 90 days of a stock, and I am predicting whether the stock will go up or down on the next day. I am using binary cross entropy as my loss and standard SGD for the optimizer. When I train, the training and validation loss continue to go down as they should, but the accuracy and validation accuracy stay around the same.
Here's my model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 90, 256) 1536
_________________________________________________________________
elu (ELU) (None, 90, 256) 0
_________________________________________________________________
flatten (Flatten) (None, 23040) 0
_________________________________________________________________
dropout (Dropout) (None, 23040) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 23593984
_________________________________________________________________
elu_1 (ELU) (None, 1024) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 524800
_________________________________________________________________
elu_2 (ELU) (None, 512) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 512) 262656
_________________________________________________________________
elu_3 (ELU) (None, 512) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
dense_4 (Dense) (None, 256) 131328
_________________________________________________________________
activation (Activation) (None, 256) 0
_________________________________________________________________
dense_5 (Dense) (None, 2) 514
_________________________________________________________________
activation_1 (Activation) (None, 2) 0
_________________________________________________________________
Total params: 24,514,818
Trainable params: 24,514,818
Non-trainable params: 0
_________________________________________________________________
I expect that either both losses should decrease while both accuracies increase, or the network will overfit and the validation loss and accuracy won't change much. Either way, shouldn't the loss and its corresponding accuracy value be directly linked and move inversely to each other?
Also, I notice that my validation loss is always less than my normal loss, which seems wrong to me.
Here's the loss (Normal: Blue, Validation: Green)
Here's the accuracy (Normal: Black, Validation: Yellow):
By definition, Accuracy score is the number of correct predictions obtained. Loss values are the values indicating the difference from the desired target state(s).
a low accuracy but low loss means you made little errors on a lot of data. a great accuracy with low loss means you made low errors on a few data (best case) your situation: a great accuracy but a huge loss, means you made huge errors on a few data.
Having a low accuracy but a high loss would mean that the model makes big errors in most of the data. But, if both loss and accuracy are low, it means the model makes small errors in most of the data. However, if they're both high, it makes big errors in some of the data.
Loss and accuracy are indeed connected, but the relationship is not so simple.
Let's say we have 6 samples, our y_true
could be:
[0, 0, 0, 1, 1, 1]
Furthermore, let's assume our network predicts following probabilities:
[0.9, 0.9, 0.9, 0.1, 0.1, 0.1]
This gives us loss equal to ~24.86
and accuracy equal to zero as every sample is wrong.
Now, after parameter updates via backprop, let's say new predictions would be:
[0.6, 0.6, 0.6, 0.4, 0.4, 0.4]
One can see those are better estimates of true distribution (loss for this example is 16.58
), while accuracy didn't change and is still zero.
All in all, the relation is more complicated, network could fix its parameters for some examples, while destroying them for other which keeps accuracy about the same.
Such situation usually occurs when your data is really complicated (or incomplete) and/or your model is too weak. Here both are the case, financial data prediction has a lot of hidden variables which your model cannot infer. Furthermore, dense layers are not the ones for this task; each day is dependent on the previous values, it is a perfect fit for Recurrent Neural Networks, you can find an article about LSTMs and how to use them here (and tons of others over the web).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With