I have a pytorch lightning code that works perfectly for a binary classification task when used with bert-base-uncased or roberta-base but doesn't work with roberta-large i.e the training loss doesn't come down.
I have no clue why this is happening. I'm looking for reasons for such an issue.
Edit: I'm training on MNLI dataset (only entailment and contradiction classes) The model is predicting the same class for all the examples.
Thanks
I decreased the learning rate slightly and the issue seems to be fixed. It's amusing to observe that changing the learning from 5e-5 to 5e-6 can have so much impact.
Now, the bigger question is "How do I find the right set of hyperparameters?"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With