Training loss is not decreasing for roberta-large model but working perfectly fine for roberta-base, bert-base-uncased

Question

I have a pytorch lightning code that works perfectly for a binary classification task when used with bert-base-uncased or roberta-base but doesn't work with roberta-large i.e the training loss doesn't come down.

I have no clue why this is happening. I'm looking for reasons for such an issue.

Edit: I'm training on MNLI dataset (only entailment and contradiction classes) The model is predicting the same class for all the examples.

Thanks

NRJ_Varshney · Accepted Answer

I decreased the learning rate slightly and the issue seems to be fixed. It's amusing to observe that changing the learning from 5e-5 to 5e-6 can have so much impact.

Now, the bigger question is "How do I find the right set of hyperparameters?"

Training loss is not decreasing for roberta-large model but working perfectly fine for roberta-base, bert-base-uncased

Tags:

huggingface-transformers

NRJ_Varshney

1 Answers

NRJ_Varshney

Recent Activity

Donate For Us

Training loss is not decreasing for roberta-large model but working perfectly fine for roberta-base, bert-base-uncased

Tags:

huggingface-transformers

NRJ_Varshney

1 Answers

NRJ_Varshney

Related questions

Recent Activity

Donate For Us