Loss not decreasing - Pytorch

Question

I am using dice loss for my implementation of a Fully Convolutional Network(FCN) which involves hypernetworks. The model has two inputs and one output which is a binary segmentation map. The model is updating weights but loss is constant. It is not even overfitting on only three training examples

I have used other loss functions as well like dice+binarycrossentropy loss, jacard loss and MSE loss but the loss is almost constant. I have also tried almost every activation function like ReLU, LeakyReLU, Tanh. Moreover I have to use sigmoid at the the output because I need my outputs to be in range [0,1] Learning rate is 0.01. Moreover, I have tried different learning rates as well like 0.0001, 0.001, 0.1. And no matter what loss the training starts at, it always comes at this value

This shows gradients for three training examples. And overall loss

tensor(0.0010, device='cuda:0')
tensor(0.1377, device='cuda:0')
tensor(0.1582, device='cuda:0')
Epoch 9, Overall loss = 0.9604763123724196, mIOU=0.019766070265581623
tensor(0.0014, device='cuda:0')
tensor(0.0898, device='cuda:0')
tensor(0.0455, device='cuda:0')
Epoch 10, Overall loss = 0.9616242945194244, mIOU=0.01919178702228237
tensor(0.0886, device='cuda:0')
tensor(0.2561, device='cuda:0')
tensor(0.0108, device='cuda:0')
Epoch 11, Overall loss = 0.960331304506822, mIOU=0.01983801422510155

I expect the loss to converge in few epochs. What should I do?

Jatentaki · Accepted Answer

It's not really a question for stack overflow. There's a million things which could be wrong and it's usually not possible to post enough code to allow us to pinpoint the issue, and even if it were, nobody could bother reading that much.

That being said, there are some general guidelines which often work for me.

Try reducing the problem. If you replace your network with a single convolutional layer, will it converge? If yes, apparently something's wrong with your network
Look at the data as you feed it as well as the labels (matplotlib plots, etc). Perhaps you're misaligning input with output (cropping issues, etc) or your data augmentation is way too strong.
Look for, well..., bugs. Perhaps you're returning torch.sigmoid(x) from your network and then feeding it into torch.nn.functional.binary_cross_entropy_with_logits (effectively applying sigmoid twice). Maybe your last layer is ReLU and your network just cannot (by construction) output negative values where you would expect them.

Finally, I've personally never had much success training with dice as the primary loss function, so I would definitely try to get it working with cross entropy first, and then move on to dice.

Loss not decreasing - Pytorch

Tags:

machine-learning

pytorch

conv-neural-network

Muhammad Hamza Mughal

1 Answers

Jatentaki

Recent Activity

Donate For Us

Loss not decreasing - Pytorch

Tags:

machine-learning

pytorch

conv-neural-network

Muhammad Hamza Mughal

1 Answers

Jatentaki

Related questions

Recent Activity

Donate For Us