Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross Entropy Loss for Semantic Segmentation Keras

I'm pretty sure this is a silly question but I can't find it anywhere else so I'm going to ask it here.

I'm doing semantic image segmentation using a cnn (unet) in keras with 7 labels. So my label for each image is (7,n_rows,n_cols) using the theano backend. So across the 7 layers for each pixel, it's one-hot encoded. In this case, is the correct error function to use categorical cross-entropy? It seems that way to me but the network seems to learn better with binary cross-entropy loss. Can someone shed some light on why that would be and what the principled objective is?

like image 866
TSW Avatar asked Feb 08 '17 16:02

TSW


People also ask

Which loss function is best for semantic segmentation?

The most commonly used loss function for the task of image segmentation is a pixel-wise cross entropy loss.

What is BCE dice loss?

BCE-Dice LossThis loss combines Dice loss with the standard binary cross-entropy (BCE) loss that is generally the default for segmentation models. Combining the two methods allows for some diversity in the loss, while benefitting from the stability of BCE.

What is binary cross-entropy loss in keras?

The Binary Cross entropy will calculate the cross-entropy loss between the predicted classes and the true classes. By default, the sum_over_batch_size reduction is used. This means that the loss will return the average of the per-sample losses in the batch.

What are commonly used loss functions for medical image segmentation and classification?

Abstract: Image structures are segmented automatically using deep learning (DL) for analysis and processing. The three most popular base loss functions are cross entropy (crossE), intersect-over- the-union (IoU), and dice.


1 Answers

Binary cross-entropy loss should be used with sigmod activation in the last layer and it severely penalizes opposite predictions. It does not take into account that the output is a one-hot coded and the sum of the predictions should be 1. But as mis-predictions are severely penalizing the model somewhat learns to classify properly.

Now to enforce the prior of one-hot code is to use softmax activation with categorical cross-entropy. This is what you should use.

Now the problem is using the softmax in your case as Keras don't support softmax on each pixel.

The easiest way to go about it is permute the dimensions to (n_rows,n_cols,7) using Permute layer and then reshape it to (n_rows*n_cols,7) using Reshape layer. Then you can added the softmax activation layer and use crossentopy loss. The data should also be reshaped accordingly.

The other way of doing so will be to implement depth-softmax :

def depth_softmax(matrix):
    sigmoid = lambda x: 1 / (1 + K.exp(-x))
    sigmoided_matrix = sigmoid(matrix)
    softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
    return softmax_matrix

and use it as a lambda layer:

model.add(Deconvolution2D(7, 1, 1, border_mode='same', output_shape=(7,n_rows,n_cols)))
model.add(Permute(2,3,1))
model.add(BatchNormalization())
model.add(Lambda(depth_softmax))

If tf image_dim_ordering is used then you can do way with the Permute layers.

For more reference check here.

like image 198
indraforyou Avatar answered Sep 30 '22 17:09

indraforyou