Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom scikit-learn scorer can't access mean after fit

I am trying to create a custom estimator based on scikit learn. I have written the below dummy code to explain my problem. In the score method, I am trying to access mean_ calulated in fit. But I am unable to. What I am doing wrong? I have tried many things and have done this referring three four articles. But didn't find the issue.

I have read the documentation and did few changes. But nothing worked. I have also tried inheriting BaseEstimator, ClassifierMixin. But that also didn't work.

This a dummy program. Don't go by what it is trying to do.

import numpy as np
from sklearn.model_selection import cross_val_score


class FilterElems:
    def __init__(self, thres):
        self.thres = thres

    def fit(self, X, y=None, **kwargs):
        self.mean_ = np.mean(X)
        self.std_ = np.std(X)
        return self

    def predict(self, X):
        #         return sign(self.predict(inputs))
        X = (X - self.mean_) / self.std_
        return X[X > self.thres]

    def get_params(self, deep=False):
        return {'thres': self.thres}

    def score(self, *x):
        print(self.mean_)  # errors out, mean_ and std_ are wiped out
        if len(x[1]) > 50:
            return 1.0
        else:
            return 0.5


model = FilterElems(thres=0.5)
print(cross_val_score(model,
                      np.random.randint(1, 1000, (100, 100)),
                      None,
                      scoring=model.score,
                      cv=5))

Err:

AttributeError: 'FilterElems' object has no attribute 'mean_'

like image 326
ggaurav Avatar asked Feb 19 '20 06:02

ggaurav


People also ask

What does fit () do in Sklearn?

The 'fit' method trains the algorithm on the training data, after the model is initialized. That's really all it does. So the sklearn fit method uses the training data as an input to train the machine learning model.

What are two parameters passed in fit method?

fit method takes two parameters, the list of points and another list of just y coordinates. X are your data samples, where each row is a datapoint (one sample, a N-dimensional feature vector). y are the datapoint labels, one per datapoint.

What does .FIT mean in Python?

Fit function adjusts weights according to data values so that better accuracy can be achieved. After training, the model can be used for predictions, using .


1 Answers

You are almost there.

The signature for scorer is scorer(estimator, X, y). The cross_val_score calls the scorer method by passing the estimator object as the first parameter. Since your signature of scorer is a variable argument function, the first item will hold the estimator

change your score to

def score(self, *x):
    print(x[0].mean_)
    if len(x[1]) > 50:
        return 1.0
    else:
        return 0.5

Working code

import numpy as np
from sklearn.model_selection import cross_val_score

class FilterElems:
    def __init__(self, thres):
        self.thres = thres

    def fit(self, X, y=None, **kwargs):
        self.mean_ = np.mean(X)
        self.std_ = np.std(X)
        return self

    def predict(self, X):
        X = (X - self.mean_) / self.std_
        return X[X > self.thres]

    def get_params(self, deep=False):
        return {'thres': self.thres}

    def score(self, estimator, *x):
        print(estimator.mean_, estimator.std_) 
        if len(x[0]) > 50:
            return 1.0
        else:
            return 0.5

model = FilterElems(thres=0.5)
print(cross_val_score(model,
                      np.random.randint(1, 1000, (100, 100)),
                      None,
                      scoring=model.score,
                      cv=5))

Outout

504.750125 288.84916035447355
501.7295 289.47825925231416
503.743375 288.8964170227962
503.0325 287.8292687406025
500.041 289.3488678377712
[0.5 0.5 0.5 0.5 0.5]
like image 55
mujjiga Avatar answered Sep 17 '22 17:09

mujjiga