While defining prototxt
in caffe, I found sometimes we use Softmax
as the last layer type, sometimes we use SoftmaxWithLoss
, I know the Softmax
layer will return the probability the input data belongs to each class, but it seems that SoftmaxwithLoss
will also return the class probability, then what's the difference between them? or did I misunderstand the usage of the two layer types?
While Softmax
returns the probability of each target class given the model predictions, SoftmaxWithLoss
not only applies the softmax operation to the predictions, but also computes the multinomial logistic loss, returned as output. This is fundamental for the training phase (without a loss there will be no gradient that can be used to update the network parameters).
See SoftmaxWithLossLayer and Caffe Loss for more info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With