I know the imbalance in an image classification problem such as the cat vs dog classification,if there are too many cat images and too few dog images. But I don't know how to adress an imbalance in a segmentation problem.
For example,my task is to mask cloud cover from satellite images, so I transform the problem to two classes of segmentation, one is cloud, the other is background. The dataset has 5800 4-band-16bits images with size of 256*256. The architecture is Segnet, the loss function is binary crossentropy.
There are two cases assumed:
So,case 2 is balanced I guess, but how about case 1?
In reality and my task, the two cases are impossible in source satellite image since the cloud cover is always relative small against the background, but if the image samples are cropped from source images because of their big size, some new cases emerge.
So, the samples always contain three types of images:
My question:
Are the samples imbalanced and what should I do?
Thanks in advance.
Usually, in segmentation tasks one considers his/hers samples "balanced" if for each image the number of pixels belonging to each class/segment is roughly the same (case 2 in your question).
In most cases, the samples are never balanced, like in your example.
What can go wrong? when there is one segment/class that dominates the samples, the model might find it easier to output all pixels as belonging to the dominant class/segment. This constant prediction although not informative can still yield high accuracy and small loss.
How can I detect such faulty result? You can make "Accuracy"
layer output not only the overall accuracy, but also the per-class accuracy. If your model is "locked" on a single class the per-class accuracy of all other classes will be very low.
What can I do? You can use "InfogainLoss"
layer to give more weight to errors on other classes to counter the effect of the dominant class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With