When people try to solve the task of semantic segmentation with CNN's they usually use a softmax-crossentropy loss during training (see Fully conv. - Long). But when it comes to comparing the performance of different approaches measures like intersection-over-union are reported.
My question is why don't people train directly on the measure they want to optimize? Seems odd to me to train on some measure during training, but evaluate on another measure for benchmarks.
I can see that the IOU has problems for training samples, where the class is not present (union=0 and intersection=0 => division zero by zero). But when I can ensure that every sample of my ground truth contains all classes, is there another reason for not using this measure?
When using this approximation, IoU becomes differentiable and can be used as a loss function. The comparison between IoU loss and Binary Cross Entropy loss is made by testing two deep neural network models on multiple datasets and data splits.
To define the term, in Machine Learning, IoU means Intersection over Union - a metric used to evaluate Deep Learning algorithms by estimating how well a predicted mask or bounding box matches the ground truth data.
One such measure of similarity is the Jaccard index or in the colloquial language of computer vision, we refer it as intersection over the union. IoU = Area of INTERSECTION /Area of UNION. IoU score ≥0.5 is considered as good.
As the IoU can range from 0 to 1, it is usually expressed as a percent, however the intuition behind what an IoU score means in terms of visual error is not intuitive (to me atleast).
Checkout this paper where they come up with a way to make the concept of IoU differentiable. I implemented their solution with amazing results!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With