I'm trying to implement my own kernel regression compatible with sklearn library. My implementation is the following:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin, RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import euclidean_distances
import models.kernel as ker
class MyKerReg(BaseEstimator, RegressorMixin):
def __init__ (self, kernel = "gaussian", bandwidth = 1.0):
self.kernel = ker.kernel(kernel)
self.bandwidth = bandwidth
def fit(self, X, y):
X, y = check_X_y(X, y, accept_sparse=True, ensure_2d=False)
self.is_fitted_ = True
self.X_ = X
self.y_ = y
return self
def predict(self, X):
X = check_array(X, accept_sparse=True, ensure_2d=False)
check_is_fitted(self, 'is_fitted_')
pred = []
for x in X:
tmp = [x - v for v in self.X_]
ker_values = [(1/self.bandwidth)*self.kernel(v/self.bandwidth) for v in tmp]
ker_values = np.array(ker_values)
values = np.array(self.y_)
num = np.dot(ker_values.T, values)
denom = np.sum(ker_values)
pred.append(num/denom)
return pred
When I call the function predict stand alone all is working well. When is used this object in the cross_val_score like this ...
y, x = misc.data_generating_process(1000)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 44)
kr = ker_reg.MyKerReg(kernel = "gaussian", bandwidth = 0.5)
print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))
... i get the following error:
Exception has occurred: RuntimeError
Cannot clone object MyKerReg(bandwidth=0.5, kernel=<models.kernel.kernel object at 0x7fab359bc940>), as the constructor either does not set or modifies parameter kernel
During handling of the above exception, another exception occurred:
File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 24, in main
print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))
File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 85, in <module>
main()
Anyone has any idea on how to fix this? I know there is a similar tread on this topic I can't still figure it out. Thank you all.
I've already read the documentation and articles on the topic and It seems like I'm doing everything right.
The __init__ method should set its parameters as attributes, with no name changes or validation. In your example, self.kernel = ker.kernel(kernel) is to blame. You can probably move that into the beginning of fit instead: leave just self.kernel = kernel in init, and self.kernel_ = ker.kernel(self.kernel) in fit.
From the developer guide:
every keyword argument accepted by
__init__should correspond to an attribute on the instance. Scikit-learn relies on this to find the relevant attributes to set on an estimator when doing model selection.[...]
There should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in
fit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With