I am doing the image semantic segmentation job with unet, if I set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
The training will not converge even for only one training image.
But if I do not set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
The training will converge for one training image.
My groundtruth dataset is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why? Is there something wrong for my usage?
This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.
from_logits = True signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model.
Cross-entropy loss is used when adjusting model weights during training. The aim is to minimize the loss, i.e, the smaller the loss the better the model. A perfect model has a cross-entropy loss of 0.
Let's look at how to implement the mean squared loss in TensorFlow. y_true = [1., 0.] y_pred = [2., 3.] This gives the output 5.0 as expected since $\frac{1}{2}[(2-1)^2 + (3-0)^2] = \frac{1}{2}(10) = 5$.
from_logits = True
signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model. For e.g. https://www.tensorflow.org/tutorials/generative/dcgan in this model they have not used a softmax activation function or in other words we can say it helps in numerical stability.
Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False
option.
You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With