I am running https://github.com/huggingface/transformers/blob/master/examples/run_glue.py to perform finetuning on a binary classification task (CoLA). I'd like to monitor both the training and evaluation losses to prevent overfitting.
Currently the library is at 2.8.0, and I did the install from source.
When I run the example with
python run_glue.py --model_name_or_path bert-base-uncased
--task_name CoLA
--do_train
--do_eval
--data_dir my_dir
--max_seq_length 128
--per_gpu_train_batch_size 8
--per_gpu_eval_batch_size 8
--learning_rate 2e-5
--num_train_epochs 3.0
--output_dir ./outputs
--logging_steps 5
In the stdout logs I see lines with one single value for the loss, such as
{"learning_rate": 3.3333333333333333e-06, "loss": 0.47537623047828675, "step": 25}
By peeking in https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py I see that training and evaluation losses are computed there (looks to me that code was recently refactored).
I have thus replaced https://github.com/huggingface/transformers/blob/abb1fa3f374811ea09d0bc3440d820c50735008d/src/transformers/trainer.py#L314 with
cr_loss = self._training_step(model, inputs, optimizer)
tr_loss += cr_loss
and added after line https://github.com/huggingface/transformers/blob/abb1fa3f374811ea09d0bc3440d820c50735008d/src/transformers/trainer.py#L345
logs["training loss"] = cr_loss
with this I get:
0502 14:12:18.644119 23632 summary.py:47] Summary name training loss is illegal; using training_loss instead.
| 4/10 [00:02<00:04, 1.49it/s]
{"learning_rate": 3.3333333333333333e-06, "loss": 0.47537623047828675, "training loss": 0.5451719760894775, "step": 25}
Is this OK, or am I doing anything wrong here?
What's the best way to monitor in stdout both the averaged training and evaluation loss for a given logging interval during finetuning?
There's likely no change needed in the code if installing a more recent version (I tried 2.9.0 via pip): just fire the finetuning with the additional flag --evaluate_during_training and output will be OK
0506 12:11:30.021593 34540 trainer.py:551] ***** Running Evaluation *****
I0506 12:11:30.022596 34540 trainer.py:552] Num examples = 140
I0506 12:11:30.023634 34540 trainer.py:553] Batch size = 8 Evaluation:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:19<00:00, 1.10s/it]
{"eval_mcc": 0.0, "eval_loss": 0.6600487811697854, "learning_rate": 3.3333333333333333e-06, "loss": 0.50044886469841, "step": 25}
beware that the example scripts change quite frequently, so flags to accomplish this may change names... see also here https://discuss.huggingface.co/t/how-to-monitor-both-train-and-validation-metrics-at-the-same-step/1301
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With