How to implement a meta-estimator with the scikit-learn API?

Tags:

My main question is where to put the threshold. I want that it gets learned only once and can be re-used in subsequent .fit calls with new data without being readjusted. But with the current version it has to be retuned on every .fit call - which I do not want?

On the other hand, if I make it a fixed parameter self.threshold and pass it to __init__, then I'm not supposed to change it with the data?

How can I make a threshold parameter which can be tuned in one call of .fit and be fixed for subsequent .fit calls?

596

asked Nov 11 '19 15:11

Gerenuk

1 Answers

I actually wrote a blog post about this the other day. I assume you are trying to build something similar to TransformedTargetRegressor I would suggest taking a look at its source code to build something similar.

Your current implementation seems about right. As far as this concern goes:

How can I make a threshold parameter which can be tuned in one call of .fit and be fixed for subsequent .fit calls?

I would suggest against that because scikit-learn's API is based around the fit method re-fitting all tunable aspects of the model. There are two routes you can go here, either add a **kwarg to the fit that explicitly protects the theshold from updating or you can go with what @rotem-tal suggested. If you choose the latter, it might look something like this:

import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin

def optimal_threshold(y_raw: np.ndarray) -> np.ndarray:
    return np.array([0.1, 0.5, 1])  # some implementation here

class Thresholder(BaseEstimator, ClassifierMixin):
    def __init__(self, regressor):
        self.regressor = regressor
        self.threshold = None

    def fit(self, X, y, optimal_threshold):
        # you don't need to clone the regressor
        self.regressor.fit(X, y)

        y_raw = self.regressor.predict()
        if self.threshold is None:
            self.threshold = optimal_threshold(y_raw)

    def predict(self, X):
        y_raw = self.regressor.predict(X)

        y = np.digitize(y_raw, [self.threshold_])

        return y

answered Sep 21 '22 15:09

Adithya

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to implement a meta-estimator with the scikit-learn API?

Tags:

python

scikit-learn

Gerenuk

People also ask

1 Answers

Adithya

Recent Activity

Donate For Us

How to implement a meta-estimator with the scikit-learn API?

Tags:

python

scikit-learn

Gerenuk

People also ask

1 Answers

Adithya

Related questions

Recent Activity

Donate For Us