I'm pretty sure this is a silly question but I can't find it anywhere else so I'm going to ask it here.
I'm doing semantic image segmentation using a cnn (unet) in keras with 7 labels. So my label for each image is (7,n_rows,n_cols) using the theano backend. So across the 7 layers for each pixel, it's one-hot encoded. In this case, is the correct error function to use categorical cross-entropy? It seems that way to me but the network seems to learn better with binary cross-entropy loss. Can someone shed some light on why that would be and what the principled objective is?
The most commonly used loss function for the task of image segmentation is a pixel-wise cross entropy loss.
BCE-Dice LossThis loss combines Dice loss with the standard binary cross-entropy (BCE) loss that is generally the default for segmentation models. Combining the two methods allows for some diversity in the loss, while benefitting from the stability of BCE.
The Binary Cross entropy will calculate the cross-entropy loss between the predicted classes and the true classes. By default, the sum_over_batch_size reduction is used. This means that the loss will return the average of the per-sample losses in the batch.
Abstract: Image structures are segmented automatically using deep learning (DL) for analysis and processing. The three most popular base loss functions are cross entropy (crossE), intersect-over- the-union (IoU), and dice.
Binary cross-entropy loss should be used with sigmod
activation in the last layer and it severely penalizes opposite predictions. It does not take into account that the output is a one-hot coded and the sum of the predictions should be 1. But as mis-predictions are severely penalizing the model somewhat learns to classify properly.
Now to enforce the prior of one-hot code is to use softmax
activation with categorical cross-entropy. This is what you should use.
Now the problem is using the softmax
in your case as Keras don't support softmax on each pixel.
The easiest way to go about it is permute the dimensions to (n_rows,n_cols,7) using Permute
layer and then reshape it to (n_rows*n_cols,7) using Reshape
layer. Then you can added the softmax
activation layer and use crossentopy loss. The data should also be reshaped accordingly.
The other way of doing so will be to implement depth-softmax :
def depth_softmax(matrix):
sigmoid = lambda x: 1 / (1 + K.exp(-x))
sigmoided_matrix = sigmoid(matrix)
softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
return softmax_matrix
and use it as a lambda layer:
model.add(Deconvolution2D(7, 1, 1, border_mode='same', output_shape=(7,n_rows,n_cols)))
model.add(Permute(2,3,1))
model.add(BatchNormalization())
model.add(Lambda(depth_softmax))
If tf
image_dim_ordering
is used then you can do way with the Permute
layers.
For more reference check here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With