How to write a custom estimator in sklearn and use cross-validation on it?

Tags:

scikit-learn

I would like to check the prediction error of a new method trough cross-validation. I would like to know if I can pass my method to the cross-validation function of sklearn and in case how.

I would like something like sklearn.cross_validation(cv=10).mymethod.

I need also to know how to define mymethod should it be a function and which input element and which output

For example we can consider as mymethod an implementation of the least square estimator (of course not the ones in sklearn) .

I found this tutorial link but it is not very clear to me.

In the documentation they use

>>> import numpy as np >>> from sklearn import cross_validation >>> from sklearn import datasets >>> from sklearn import svm  >>> iris = datasets.load_iris() >>> iris.data.shape, iris.target.shape ((150, 4), (150,))   >>> clf = svm.SVC(kernel='linear', C=1)   >>> scores = cross_validation.cross_val_score(  ...    clf, iris.data, iris.target, cv=5)  ...  >>> scores

But the problem is that they are using as estimator clf that is obtained by a function built in sklearn. How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score function?

So for example suppose a simple estimator that use a linear model $y=x\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. How should I complete the code?

class my_estimator():       def fit(X,y):           beta=X[1,:]+alpha #where can I pass alpha to the function?           return beta       def scorer(estimator, X, y) #what should the scorer function compute?           return ?????

With the following code I received an error:

class my_estimator():     def fit(X, y, **kwargs):         #alpha = kwargs['alpha']         beta=X[1,:]#+alpha          return beta

>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error") Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score     for train, test in cv)   File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__     for function, args, kwargs in iterable:   File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>     for train, test in cv)   File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone     % (repr(estimator), type(estimator))) TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods. >>>

507

asked Dec 02 '13 14:12

Donbeo

1 Answers

The answer also lies in sklearn's documentation.

You need to define two things:

an estimator that implements the fit(X, y) function, X being the matrix with inputs and y being the vector of outputs
a scorer function, or callable object that can be used with: scorer(estimator, X, y) and returns the score of given model

Referring to your example: first of all, scorer shouldn't be a method of the estimator, it's a different notion. Just create a callable:

def scorer(estimator, X, y)     return ?????  # compute whatever you want, it's up to you to define                   # what does it mean that the given estimator is "good" or "bad"

Or even a more simple solution: you can pass a string 'mean_squared_error' or 'accuracy' (full list available in this part of the documentation) to cross_val_score function to use a predefined scorer.

Another possibility is to use make_scorer factory function.

As for the second thing, you can pass parameters to your model through the fit_params dict parameter of the cross_val_score function (as mentioned in the documentation). These parameters will be passed to the fit function.

class my_estimator():     def fit(X, y, **kwargs):         alpha = kwargs['alpha']         beta=X[1,:]+alpha          return beta

After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example:

import numpy as np from sklearn.cross_validation import cross_val_score  class RegularizedRegressor:     def __init__(self, l = 0.01):         self.l = l      def combine(self, inputs):         return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])      def predict(self, X):         return [self.combine(x) for x in X]      def classify(self, inputs):         return sign(self.predict(inputs))      def fit(self, X, y, **kwargs):         self.l = kwargs['l']         X = np.matrix(X)         y = np.matrix(y)         W = (X.transpose() * X).getI() * X.transpose() * y          self.weights = [w[0] for w in W.tolist()]      def get_params(self, deep = False):         return {'l':self.l}  X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]]) y = np.matrix([0, 1, 1, 0]).transpose()  print cross_val_score(RegularizedRegressor(),                       X,                       y,                        fit_params={'l':0.1},                       scoring = 'mean_squared_error')

145

answered Sep 19 '22 18:09

BartoszKP

Related questions
                            
                                Global variable with imports
                            
                                Launch a totally independent process from Python
                            
                                Confidence interval for LOWESS in Python
                            
                                Is there a good dependency analysis tool for Python? [closed]
                            
                                Communicate multiple times with a process without breaking the pipe?
                            
                                python: Can I run a python script without actually installing python?
                            
                                How to right-align columns content in reStructuredText simple tables?
                            
                                How do I install Python libraries in wheel format?
                            
                                How to run tests without installing package?
                            
                                What are the steps to make a ModelForm work with a ManyToMany relationship with an intermediary model in Django?
                            
                                Why does the calculated width and height in pixel of a string in Tkinter differ between platforms?
                            
                                pip install PyQt IOError
                            
                                How can a #defined C value be exposed to Python in a Cython module?
                            
                                python paths and import order
                            
                                How to fix: W602 deprecated form of raising exception
                            
                                python: What happens when class attribute, instance attribute, and method all have the same name?
                            
                                Distribution of final digits of random numbers in Python
                            
                                Bisect, is it possible to work with descending sorted lists?
                            
                                In Python what's the best way to emulate Perl's __END__?
                            
                                How is the __name__ variable in a Python module defined?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With