I am using dice loss for my implementation of a Fully Convolutional Network(FCN) which involves hypernetworks. The model has two inputs and one output which is a binary segmentation map. The model is updating weights but loss is constant. It is not even overfitting on only three training examples
I have used other loss functions as well like dice+binarycrossentropy loss, jacard loss and MSE loss but the loss is almost constant. I have also tried almost every activation function like ReLU, LeakyReLU, Tanh. Moreover I have to use sigmoid at the the output because I need my outputs to be in range [0,1] Learning rate is 0.01. Moreover, I have tried different learning rates as well like 0.0001, 0.001, 0.1. And no matter what loss the training starts at, it always comes at this value
This shows gradients for three training examples. And overall loss
tensor(0.0010, device='cuda:0')
tensor(0.1377, device='cuda:0')
tensor(0.1582, device='cuda:0')
Epoch 9, Overall loss = 0.9604763123724196, mIOU=0.019766070265581623
tensor(0.0014, device='cuda:0')
tensor(0.0898, device='cuda:0')
tensor(0.0455, device='cuda:0')
Epoch 10, Overall loss = 0.9616242945194244, mIOU=0.01919178702228237
tensor(0.0886, device='cuda:0')
tensor(0.2561, device='cuda:0')
tensor(0.0108, device='cuda:0')
Epoch 11, Overall loss = 0.960331304506822, mIOU=0.01983801422510155
I expect the loss to converge in few epochs. What should I do?
It's not really a question for stack overflow. There's a million things which could be wrong and it's usually not possible to post enough code to allow us to pinpoint the issue, and even if it were, nobody could bother reading that much.
That being said, there are some general guidelines which often work for me.
torch.sigmoid(x) from your network and then feeding it into torch.nn.functional.binary_cross_entropy_with_logits (effectively applying sigmoid twice). Maybe your last layer is ReLU and your network just cannot (by construction) output negative values where you would expect them.Finally, I've personally never had much success training with dice as the primary loss function, so I would definitely try to get it working with cross entropy first, and then move on to dice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With