Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training loss is not decreasing for roberta-large model but working perfectly fine for roberta-base, bert-base-uncased

I have a pytorch lightning code that works perfectly for a binary classification task when used with bert-base-uncased or roberta-base but doesn't work with roberta-large i.e the training loss doesn't come down.

I have no clue why this is happening. I'm looking for reasons for such an issue.

Edit: I'm training on MNLI dataset (only entailment and contradiction classes) The model is predicting the same class for all the examples.

Thanks

like image 966
NRJ_Varshney Avatar asked Sep 05 '25 03:09

NRJ_Varshney


1 Answers

I decreased the learning rate slightly and the issue seems to be fixed. It's amusing to observe that changing the learning from 5e-5 to 5e-6 can have so much impact.

Now, the bigger question is "How do I find the right set of hyperparameters?"

like image 72
NRJ_Varshney Avatar answered Sep 07 '25 23:09

NRJ_Varshney