Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom Scoring Function in sklearn Cross Validate

I would like to use a custom function for cross_validate which uses a specific y_test to compute precision, this is a different y_test than the actual target y_test.

I have tried a few approaches with make_scorer but I don't know how to actually pass my alternative y_test:

scoring = {'prec1': 'precision',
     'custom_prec1': make_scorer(precision_score()}

scores = cross_validate(pipeline, X, y, cv=5,scoring= scoring)

Can any suggest an approach?

like image 339
Tartaglia Avatar asked Jan 07 '19 01:01

Tartaglia


People also ask

How to change the score of a model in sklearn?

It is possible to change this by using the scoring parameter: >>> from sklearn import metrics >>> scores = cross_val_score( ... clf, X, y, cv=5, scoring='f1_macro') >>> scores array ( [0.96..., 1. ..., 0.96..., 0.96..., 1. ]) See The scoring parameter: defining model evaluation rules for details.

How to use cross-validation with SCI-kit learn?

The easies way to use cross-validation with sci-kit learn is the cross_val_score function. The function uses the default scoring method for each model. For example, if you use Gaussian Naive Bayes, the scoring method is the mean accuracy on the given test data and labels.

What is the difference between cross_Val_score and cross_validate?

The cross_validate function differs from cross_val_score in two ways: It allows specifying multiple metrics for evaluation. It returns a dict containing fit-times, score-times (and optionally training scores as well as fitted estimators) in addition to the test score.

How do you evaluate the performance of a cross-validated model?

Strategy to evaluate the performance of the cross-validated model on the test set. If scoring represents a single score, one can use: a callable (see Defining your scoring strategy from metric functions) that returns a single value.


1 Answers

Found this way. Maybe the code is not optimal, sorry for this.

Okay, let we start:

import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import GridSearchCV
from sklearn.metrics.scorer import make_scorer

xTrain = np.random.rand(100, 100)
yTrain = np.random.randint(1, 4, (100, 1))

yTrainCV = np.random.randint(1, 4, (100, 1))

model = LogisticRegression()

yTrainCV will be used here as the custom scorer.

def customLoss(xArray, yArray):
    indices = xArray.index.values
    tempArray = [1 if value1 != value2 else 0 for value1, value2 in zip(xArray.values, yTrainCV[[indices]])]

    return sum(tempArray)

scorer = {'main': 'accuracy',
          'custom': make_scorer(customLoss, greater_is_better=True)}

Few tricks here:

  • you need to pass to customLoss 2 values (predictions from the model + real values; we do not use the second parameter though)
  • there is some game with greater_is_better: True/False will return either positive or negative number
  • indices we get from CV in GridSearchCV

And...

grid = GridSearchCV(model,
                    scoring=scorer,
                    cv=5,
                    param_grid={'C': [1e0, 1e1, 1e2, 1e3],
                                'class_weight': ['balanced', None]},
                    refit='custom')

 grid.fit(xTrain, pd.DataFrame(yTrain))
 print(grid.score(xTrain, pd.DataFrame(yTrain)))
  • do not forget refit parameter in GridSearchCV
  • we pass target array as DataFrame here - it will help us to detect indices in the custom loss function
like image 185
avchauzov Avatar answered Nov 16 '22 02:11

avchauzov