I am trying to model the Neural Net for solving CIFAR-10 dataset, but there is this very odd problem I am facing, I have tried over 6 different CNN architecture and with many different CNN hyperparameters and fully connected #neurons values, but all seem to fail with loss of 2.302 and corresponding accuracy of 0.0625, why does this happen, what property of CNN or neural net makes this, I also tried dropout, l2_norm, different kernel sizes, different padding in CNN and Max Pool. I don't understand why the loss gets stuck over such an odd number?
I am implementing this using tensorflow, and I have tried softmax layer + cross_entropy_loss, and without_softmax_layer + sparse_cross_entropy_loss. Is it the plateau the neural net loss function is stuck at?
This seems like you accidentally applied a non-linearity/activation function to the last layer of your network. Keep in mind that the cross entropy works upon values within a range between 0 and 1. As you "force" your output to this range automatically by applying the softmax function just before computing the cross entropy, you should just "apply" a linear activation function (just don't add any).
By the way, the value of 2.302 is not random by any chance. It is rather the result of the softmax loss being -ln(0.1) when you assume that all 10 classes (CIFAR-10) initially got the same expected diffuse probability of 0.1. Check out the explanation by Andrej Karpathy: http://cs231n.github.io/neural-networks-3/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With