Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing two evaluation datasets to HuggingFace Trainer objects

Is there any ways to pass two evaluation datasets to a HuggingFace Trainer object so that the trained model can be evaluated on two different sets (say in-distribution and out-of-distribution sets) during training? Here is the instantiation of the object, which accepts just one eval_dataset:

trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer
)
like image 748
Hossein Avatar asked Sep 03 '25 16:09

Hossein


1 Answers

I am looking for a solution too. In the meantime, I can propose this workaround:

Trainer doesn't shuffle the examples in the dataset during the evaluation. So you can use this to merge the two datasets as long as you control this merge and know the number of examples in composing datasets. You then separate the examples later when calculating your metrics.

For this, you will have to implement your own compute_metrics callable and pass via trainer = Trainer(compute_metrics=myComputeMertics). Note that you don't control the arguments of this function, so it's probably best to implement it as a method of some class, and pass your dataset ratio / composition in the constructor.

I can imagine doing something like this (not sure that the tensor shapes and axes are correct):

class MetricCollection:
    def __init__(self, dataset_1_size):
        self.dataset_1_size = dataset_1_size

    def mymetric(labels, predicted_scores):
        ...
        return result

    def compute_metrics(self, p: EvalPrediction) -> Dict:
        metrics = {}

        labels_1 = p.label_ids[:self.dataset_1_size]
        labels_2 = p.label_ids[self.dataset_1_size:]

        predictions_1 = p.predictions[:self.dataset_1_size, :]
        predictions_2 = p.predictions[self.dataset_1_size:, :]

        metrics['mymetric_dataset_1'] = mymetric(labels_1, predictions_1)
        metrics['mymetric_dataset_2'] = mymetric(labels_2, predictions_2)

        return metrics

and then in your main code

metric_calculator = MetricCollection(dataset_1_size)

trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=metric_calculator.compute_metrics()
)
like image 172
Vladimir Maryasin Avatar answered Sep 05 '25 14:09

Vladimir Maryasin