Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use GridSearchCV output for a scikit prediction?

In the following code:

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = {
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 }

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

What should be used for a prediction?

  • gs?
  • gs.best_estimator_? or
  • gs.best_estimator_.named_steps['clf']?

What is the difference between these 3?

like image 437
user308827 Avatar asked Feb 14 '16 05:02

user308827


People also ask

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

What is Param_grid in GridSearchCV?

param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.


1 Answers

gs.predict(X_test) is equivalent to gs.best_estimator_.predict(X_test). Using either, X_test will be passed through your entire pipeline and it will return the predictions.

gs.best_estimator_.named_steps['clf'].predict(), however is only the last phase of the pipeline. To use it, the feature selection step must already have been performed. This would only work if you have previously run your data through gs.best_estimator_.named_steps['fs'].transform()

Three equivalent methods for generating predictions are shown below:

Using gs directly.

pred = gs.predict(X_test)

Using best_estimator_.

pred = gs.best_estimator_.predict(X_test)

Calling each step in the pipeline individual.

X_test_fs = gs.best_estimator_.named_steps['fs'].transform(X_test)
pred = gs.best_estimator_.named_steps['clf'].predict(X_test_fs)
like image 127
David Maust Avatar answered Sep 28 '22 06:09

David Maust