I would like to use a custom function for cross_validate
which uses a specific y_test
to compute precision, this is a different y_test
than the actual target y_test
.
I have tried a few approaches with make_scorer
but I don't know how to actually pass my alternative y_test
:
scoring = {'prec1': 'precision',
'custom_prec1': make_scorer(precision_score()}
scores = cross_validate(pipeline, X, y, cv=5,scoring= scoring)
Can any suggest an approach?
It is possible to change this by using the scoring parameter: >>> from sklearn import metrics >>> scores = cross_val_score( ... clf, X, y, cv=5, scoring='f1_macro') >>> scores array ( [0.96..., 1. ..., 0.96..., 0.96..., 1. ]) See The scoring parameter: defining model evaluation rules for details.
The easies way to use cross-validation with sci-kit learn is the cross_val_score function. The function uses the default scoring method for each model. For example, if you use Gaussian Naive Bayes, the scoring method is the mean accuracy on the given test data and labels.
The cross_validate function differs from cross_val_score in two ways: It allows specifying multiple metrics for evaluation. It returns a dict containing fit-times, score-times (and optionally training scores as well as fitted estimators) in addition to the test score.
Strategy to evaluate the performance of the cross-validated model on the test set. If scoring represents a single score, one can use: a callable (see Defining your scoring strategy from metric functions) that returns a single value.
Found this way. Maybe the code is not optimal, sorry for this.
Okay, let we start:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics.scorer import make_scorer
xTrain = np.random.rand(100, 100)
yTrain = np.random.randint(1, 4, (100, 1))
yTrainCV = np.random.randint(1, 4, (100, 1))
model = LogisticRegression()
yTrainCV
will be used here as the custom scorer.
def customLoss(xArray, yArray):
indices = xArray.index.values
tempArray = [1 if value1 != value2 else 0 for value1, value2 in zip(xArray.values, yTrainCV[[indices]])]
return sum(tempArray)
scorer = {'main': 'accuracy',
'custom': make_scorer(customLoss, greater_is_better=True)}
Few tricks here:
greater_is_better
: True
/False
will return either positive or negative numberGridSearchCV
And...
grid = GridSearchCV(model,
scoring=scorer,
cv=5,
param_grid={'C': [1e0, 1e1, 1e2, 1e3],
'class_weight': ['balanced', None]},
refit='custom')
grid.fit(xTrain, pd.DataFrame(yTrain))
print(grid.score(xTrain, pd.DataFrame(yTrain)))
refit
parameter in GridSearchCV
DataFrame
here - it will help us to detect indices in the custom loss functionIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With