I am trying to create a custom estimator based on scikit learn. I have written the below dummy code to explain my problem. In the score method, I am trying to access mean_
calulated in fit. But I am unable to. What I am doing wrong? I have tried many things and have done this referring three four articles. But didn't find the issue.
I have read the documentation and did few changes. But nothing worked. I have also tried inheriting BaseEstimator
, ClassifierMixin
. But that also didn't work.
This a dummy program. Don't go by what it is trying to do.
import numpy as np
from sklearn.model_selection import cross_val_score
class FilterElems:
def __init__(self, thres):
self.thres = thres
def fit(self, X, y=None, **kwargs):
self.mean_ = np.mean(X)
self.std_ = np.std(X)
return self
def predict(self, X):
# return sign(self.predict(inputs))
X = (X - self.mean_) / self.std_
return X[X > self.thres]
def get_params(self, deep=False):
return {'thres': self.thres}
def score(self, *x):
print(self.mean_) # errors out, mean_ and std_ are wiped out
if len(x[1]) > 50:
return 1.0
else:
return 0.5
model = FilterElems(thres=0.5)
print(cross_val_score(model,
np.random.randint(1, 1000, (100, 100)),
None,
scoring=model.score,
cv=5))
Err:
AttributeError: 'FilterElems' object has no attribute 'mean_'
The 'fit' method trains the algorithm on the training data, after the model is initialized. That's really all it does. So the sklearn fit method uses the training data as an input to train the machine learning model.
fit method takes two parameters, the list of points and another list of just y coordinates. X are your data samples, where each row is a datapoint (one sample, a N-dimensional feature vector). y are the datapoint labels, one per datapoint.
Fit function adjusts weights according to data values so that better accuracy can be achieved. After training, the model can be used for predictions, using .
You are almost there.
The signature for scorer is scorer(estimator, X, y)
. The cross_val_score
calls the scorer
method by passing the estimator
object as the first parameter. Since your signature of scorer
is a variable argument function, the first item will hold the estimator
change your score to
def score(self, *x):
print(x[0].mean_)
if len(x[1]) > 50:
return 1.0
else:
return 0.5
Working code
import numpy as np
from sklearn.model_selection import cross_val_score
class FilterElems:
def __init__(self, thres):
self.thres = thres
def fit(self, X, y=None, **kwargs):
self.mean_ = np.mean(X)
self.std_ = np.std(X)
return self
def predict(self, X):
X = (X - self.mean_) / self.std_
return X[X > self.thres]
def get_params(self, deep=False):
return {'thres': self.thres}
def score(self, estimator, *x):
print(estimator.mean_, estimator.std_)
if len(x[0]) > 50:
return 1.0
else:
return 0.5
model = FilterElems(thres=0.5)
print(cross_val_score(model,
np.random.randint(1, 1000, (100, 100)),
None,
scoring=model.score,
cv=5))
Outout
504.750125 288.84916035447355
501.7295 289.47825925231416
503.743375 288.8964170227962
503.0325 287.8292687406025
500.041 289.3488678377712
[0.5 0.5 0.5 0.5 0.5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With