I would like to use a custom function for <code>cross_validate</code> which uses a specific <code>y_test</code> to compute precision, this is a different <code>y_test</code> than the actual target <code>y_test</code>. I have tried a few approaches with <code>make_scorer</code> but I don't know how to actually pass my alternative <code>y_test</code>: <pre class="prettyprint"><code>scoring = {'prec1': 'precision', 'custom_prec1': make_scorer(precision_score()} scores = cross_validate(pipeline, X, y, cv=5,scoring= scoring) </code></pre> Can any suggest an approach?

Found this way. Maybe the code is not optimal, sorry for this. Okay, let we start: <pre class="prettyprint"><code>import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.metrics.scorer import make_scorer xTrain = np.random.rand(100, 100) yTrain = np.random.randint(1, 4, (100, 1)) yTrainCV = np.random.randint(1, 4, (100, 1)) model = LogisticRegression() </code></pre> <code>yTrainCV</code> will be used here as the custom scorer. <pre class="prettyprint"><code>def customLoss(xArray, yArray): indices = xArray.index.values tempArray = [1 if value1 != value2 else 0 for value1, value2 in zip(xArray.values, yTrainCV[[indices]])] return sum(tempArray) scorer = {'main': 'accuracy', 'custom': make_scorer(customLoss, greater_is_better=True)} </code></pre> Few tricks here: <ul> <li>you need to pass to customLoss 2 values (predictions from the model + real values; we do not use the second parameter though)</li> <li>there is some game with <code>greater_is_better</code>: <code>True</code>/<code>False</code> will return either positive or negative number</li> <li>indices we get from CV in <code>GridSearchCV</code> </li> </ul> And... <pre class="prettyprint"><code>grid = GridSearchCV(model, scoring=scorer, cv=5, param_grid={'C': [1e0, 1e1, 1e2, 1e3], 'class_weight': ['balanced', None]}, refit='custom') grid.fit(xTrain, pd.DataFrame(yTrain)) print(grid.score(xTrain, pd.DataFrame(yTrain))) </code></pre> <ul> <li>do not forget <code>refit</code> parameter in <code>GridSearchCV</code> </li> <li>we pass target array as <code>DataFrame</code> here - it will help us to detect indices in the custom loss function</li> </ul>

Custom Scoring Function in sklearn Cross Validate

scoring = {'prec1': 'precision',
     'custom_prec1': make_scorer(precision_score()}

scores = cross_validate(pipeline, X, y, cv=5,scoring= scoring)

Can any suggest an approach?

339

asked Jan 07 '19 01:01

Tartaglia

1 Answers

Found this way. Maybe the code is not optimal, sorry for this.

Okay, let we start:

import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import GridSearchCV
from sklearn.metrics.scorer import make_scorer

xTrain = np.random.rand(100, 100)
yTrain = np.random.randint(1, 4, (100, 1))

yTrainCV = np.random.randint(1, 4, (100, 1))

model = LogisticRegression()

yTrainCV will be used here as the custom scorer.

def customLoss(xArray, yArray):
    indices = xArray.index.values
    tempArray = [1 if value1 != value2 else 0 for value1, value2 in zip(xArray.values, yTrainCV[[indices]])]

    return sum(tempArray)

scorer = {'main': 'accuracy',
          'custom': make_scorer(customLoss, greater_is_better=True)}

Few tricks here:

you need to pass to customLoss 2 values (predictions from the model + real values; we do not use the second parameter though)
there is some game with greater_is_better: True/False will return either positive or negative number
indices we get from CV in GridSearchCV

And...

grid = GridSearchCV(model,
                    scoring=scorer,
                    cv=5,
                    param_grid={'C': [1e0, 1e1, 1e2, 1e3],
                                'class_weight': ['balanced', None]},
                    refit='custom')

 grid.fit(xTrain, pd.DataFrame(yTrain))
 print(grid.score(xTrain, pd.DataFrame(yTrain)))

do not forget refit parameter in GridSearchCV
we pass target array as DataFrame here - it will help us to detect indices in the custom loss function

185

answered Nov 16 '22 02:11

avchauzov

Related questions
                            
                                Using Wagtail as an API layer
                            
                                Trio execution time without IO operations
                            
                                configparser.ParsingError: Source contains parsing errors: 'my.ini'
                            
                                pandas Series.value_counts returns inconsistent order for equal count strings
                            
                                How to use ast.literal_eval in a pandas dataframe and handle exceptions
                            
                                What exactly is a matplotlib axes object?
                            
                                Numpy filter using condition on each element
                            
                                Set tkinter icon on Mac OS
                            
                                How to determine an overfitted model based on loss precision and recall
                            
                                Select top n TFIDF features for a given document
                            
                                Comparing Conv2D with padding between Tensorflow and PyTorch
                            
                                How to create a torchtext.data.TabularDataset directly from a list or dict
                            
                                TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper
                            
                                What is the purpose of "a and a or b"?
                            
                                Flask request.get_json() returns string not json
                            
                                Size mismatch for fc.bias and fc.weight in PyTorch
                            
                                Keras Embedding ,where is the "weights" argument?
                            
                                Pandas to Excel (Merged Header Column)
                            
                                tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)) in tensorflow
                            
                                How to calculate the average of the most recent three non-nan value using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Custom Scoring Function in sklearn Cross Validate

Tags:

python

scikit-learn

cross-validation

Tartaglia

People also ask

1 Answers

avchauzov

Recent Activity

Donate For Us