Grid Search with Recursive Feature Elimination in scikit-learn pipeline returns an error

Tags:

2 Answers

You have an issue with your use of pipeline.

A pipeline works as below:

first object is applied to data when you call .fit(x,y) etc. If that method exposes a .transform() method, this is applied and this output is used as the input for the next stage.

A pipeline can have any valid model as a final object, but all previous ones MUST expose a .transform() method.

Just like a pipe - you feed in data and each object in the pipeline takes the previous output and does another transform on it.

As we can see,

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE.fit_transform

RFE exposes a transform method, and so should be included in the pipeline itself. E.g.

some_sklearn_model=RandomForestClassifier()
selector = feature_selection.RFE(some_sklearn_model)
pipe_params = [('std_scaler', std_scaler), ('RFE', rfe),('clf', est)]

Your attempt has a few issues. Firstly, you are trying to scale a slice of your data. Imagine I had two partitions [1,1], [10,10]. If I normalize by the mean of the partition I lose the information that my second partition is significantly above the mean. You should scale at the start, not in the middle.

Secondly, SVR does not implement a transform method, you cannot incorporate it as a non final element in a pipeline.

RFE takes in a model which it fits to the data and then evaluates the weight of each feature.

EDIT:

You can include this behaviour if you wish, by wrapping the sklearn pipeline in your own class. What we want to do is when we fit the data, retrieve the last estimators .coef_ method and store that locally in our derived class under the correct name. I suggest you look into the sourcecode on github as this is only a first start and more error checking etc would probably be required. Sklearn uses a function decorator called @if_delegate_has_method which would be a handy thing to add to ensure the method generalises. I have run this code to make sure it works runs, but nothing more.

from sklearn.datasets import make_friedman1
from sklearn import feature_selection
from sklearn import preprocessing
from sklearn import pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR

class myPipe(pipeline.Pipeline):

    def fit(self, X,y):
        """Calls last elements .coef_ method.
        Based on the sourcecode for decision_function(X).
        Link: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/pipeline.py
        ----------
        """

        super(myPipe, self).fit(X,y)

        self.coef_=self.steps[-1][-1].coef_
        return

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

est = SVR(kernel="linear")

selector = feature_selection.RFE(est)
std_scaler = preprocessing.StandardScaler()
pipe_params = [('std_scaler', std_scaler),('select', selector), ('clf', est)]

pipe = myPipe(pipe_params)



selector = feature_selection.RFE(pipe)
clf = GridSearchCV(selector, param_grid={'estimator__clf__C': [2, 10]})
clf.fit(X, y)

print clf.best_params_

if anything is not clear, please ask.

120

answered Oct 11 '22 14:10

Chris

I think you had a slightly different way of constructing the pipeline than what was listed in the pipeline documentation.

Are you looking for this?

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

est = SVR(kernel="linear")

std_scaler = preprocessing.StandardScaler()
selector = feature_selection.RFE(est)
pipe_params = [('feat_selection',selector),('std_scaler', std_scaler), ('clf', est)]
pipe = pipeline.Pipeline(pipe_params)

param_grid = dict(clf__C=[0.1, 1, 10])
clf = GridSearchCV(pipe, param_grid=param_grid, cv=2)
clf.fit(X, y)
print clf.grid_scores_

Also see this useful example for combining things in a pipeline. For the RFE object, I just used the official documentation for constructing it with your SVR estimator - I then just put the RFE object into the pipeline in the same way as you had done with the scaler and estimator objects.

answered Oct 11 '22 12:10

edesz

Related questions
                            
                                How to decorate a generator in python
                            
                                Python unittest - asserting dictionary with lists
                            
                                How to predict tides using harmonic constants
                            
                                Passing data from javascript into Flask
                            
                                Python: Respond to Command Line Prompts
                            
                                Is it possible to salt and or hash HOTP/TOTP secret on the server?
                            
                                Python async and CPU-bound tasks?
                            
                                Selenium download full html page
                            
                                Adding new member variables to python objects?
                            
                                Idiomatic python - property or method?
                            
                                Catching changes to a mutable attribute in python
                            
                                Mixing Python and Go
                            
                                Understanding MySQL Cursor Types
                            
                                Django admin: use checkboxes in list view in list_filter()
                            
                                How to test function is called with correct arguments with pytest?
                            
                                How to suppress displaying the parent exception (the cause) for subsequent exceptions
                            
                                python bottle always logs to console, no logging to file
                            
                                PIL Image mode I is grayscale?
                            
                                Evaluating a mathematical expression (function) for a large number of input values fast
                            
                                Python Requests: Invalid Header Name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grid Search with Recursive Feature Elimination in scikit-learn pipeline returns an error

Tags:

python

scikit-learn

hubi86

People also ask

2 Answers

Chris

edesz

Recent Activity

Donate For Us