I am doing the image semantic segmentation job with unet, if I set the <code>Softmax Activation</code> for last layer like this: <pre class="prettyprint"><code>... conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9) conv10 = (Activation('softmax'))(conv9) model = Model(inputs, conv10) return model ... </code></pre> and then using <code>loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)</code> The training will not converge even for only one training image. But if I do not set the <code>Softmax Activation</code> for last layer like this: <pre class="prettyprint"><code>... conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9) model = Model(inputs, conv9) return model ... </code></pre> and then using <code>loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)</code> The training will converge for one training image. My groundtruth dataset is generated like this: <pre class="prettyprint"><code>X = [] Y = [] im = cv2.imread(impath) X.append(im) seg_labels = np.zeros((height, width, n_classes)) for spath in segpaths: mask = cv2.imread(spath, 0) seg_labels[:, :, c] += mask Y.append(seg_labels.reshape(width*height, n_classes)) </code></pre> Why? Is there something wrong for my usage? This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable. It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the <code>from_logits=False</code> option. You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

Tags:

python

tensorflow

keras

image-segmentation

tf.keras

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) The training will not converge even for only one training image.

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) The training will converge for one training image.

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

Why? Is there something wrong for my usage?

This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

588

asked Jul 29 '19 12:07

tidy

2 Answers

from_logits = True signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model. For e.g. https://www.tensorflow.org/tutorials/generative/dcgan in this model they have not used a softmax activation function or in other words we can say it helps in numerical stability.

182

answered Oct 17 '22 19:10

Maheep

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

answered Oct 17 '22 18:10

Shai

Related questions
                            
                                Why does the symbol '{' remain when f"\{10}" is evaluated in Python 3.6?
                            
                                How to obscure a line behind a surface plot in matplotlib?
                            
                                having trouble installing awslogs agent
                            
                                Python Version Numbering Scheme
                            
                                Do something at the beginning & end of methods
                            
                                How do I use the "group_by_window" function in TensorFlow
                            
                                What do the lines in Seaborn.Regplot represent
                            
                                Comparing a large number of graphs for isomorphism
                            
                                Selenium "selenium.common.exceptions.NoSuchElementException" when using Chrome
                            
                                How to get `python` to run Python 3 in WSL bash?
                            
                                OpenCV - Calibrate fisheye lens error (Ill-conditioned matrix)
                            
                                Projection of a point to a line segment Python Shapely
                            
                                No module named '_bz2' in python3
                            
                                What's the difference between using tf.expand_dims and tf.newaxis in Tensorflow?
                            
                                Using result_type with pandas apply function
                            
                                pandas - how to get last n groups of a groupby object and combine them as a dataframe
                            
                                Lots of edges on a graph plot in python
                            
                                Cannot compare types 'ndarray(dtype=int64)' and 'str'
                            
                                Python multiprocessing crashes docker container
                            
                                How do I check if current code is part of a try-except-block?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With