How to use GridSearchCV output for a scikit prediction?

Tags:

In the following code:

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = {
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 }

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

What should be used for a prediction?

gs?
gs.best_estimator_? or
gs.best_estimator_.named_steps['clf']?

What is the difference between these 3?

437

asked Feb 14 '16 05:02

user308827

1 Answers

gs.predict(X_test) is equivalent to gs.best_estimator_.predict(X_test). Using either, X_test will be passed through your entire pipeline and it will return the predictions.

gs.best_estimator_.named_steps['clf'].predict(), however is only the last phase of the pipeline. To use it, the feature selection step must already have been performed. This would only work if you have previously run your data through gs.best_estimator_.named_steps['fs'].transform()

Three equivalent methods for generating predictions are shown below:

Using gs directly.

Click to copy

pred = gs.predict(X_test)

Using best_estimator_.

Click to copy

pred = gs.best_estimator_.predict(X_test)

Calling each step in the pipeline individual.

Click to copy

X_test_fs = gs.best_estimator_.named_steps['fs'].transform(X_test)
pred = gs.best_estimator_.named_steps['clf'].predict(X_test_fs)

127

answered Sep 28 '22 06:09

David Maust

Related questions
                            
                                Pre-populating a BooleanField as checked (WTForms)
                            
                                Stop cssutils from generating warning messages
                            
                                How to exclude rows/columns from numpy.ndarray data
                            
                                How to pass along username and password to cassandra in python
                            
                                Testing for KeyError
                            
                                Close pyplot figure using the keyboard on Mac OS X
                            
                                django app in heroku getting worker timeout error
                            
                                python moving multiple files from one folder to the other based on text characters in file name
                            
                                How to install xmlrpclib in python 3.4?
                            
                                ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value
                            
                                numpy.vectorize returns incorrect values
                            
                                Will a Python dict literal be evaluated in the order it is written?
                            
                                Traversing a list of lists by index within a loop, to reformat strings
                            
                                What dtype to use for money representation in pandas dataframe?
                            
                                Use Scikit Learn to do linear regression on a time series pandas data frame
                            
                                pandas: Convert Series of DataFrames to single DataFrame
                            
                                max([x for x in something]) vs max(x for x in something): why is there a difference and what is it?
                            
                                How do I write/create a GeoTIFF RGB image file in python?
                            
                                Preventing splitting at apostrophies when tokenizing words using nltk
                            
                                ImportError: No module named pydot ( unable to import pydot)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use GridSearchCV output for a scikit prediction?

Tags:

python

scikit-learn

grid-search

user308827

People also ask

1 Answers

David Maust

Recent Activity

Donate For Us