Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining Recursive Feature Elimination and Grid Search in scikit-learn

I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it:

ValueError: Invalid parameter bootstrap for estimator RFECV

Is it possible to get the selector from RFECV and pass it directly to RandomizedSearchCV, or is this procedurally not the right thing to do?

from sklearn.datasets import make_classification
from sklearn.feature_selection import RFECV
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint as sp_randint

# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=5, n_redundant=2, n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0)

grid = {"max_depth": [3, None],
        "min_samples_split": sp_randint(1, 11),
        "min_samples_leaf": sp_randint(1, 11),
        "bootstrap": [True, False],
        "criterion": ["gini", "entropy"]}

estimator = RandomForestClassifierCoef()
clf = RandomizedSearchCV(estimator, param_distributions=grid, cv=7)
clf.fit(X, y)
estimator = clf.best_estimator_

selector = RFECV(estimator, step=1, cv=4)
selector.fit(X, y)
selector.grid_scores_
like image 414
Mark Conway Avatar asked Aug 25 '15 15:08

Mark Conway


People also ask

What is Param_grid in GridSearchCV?

param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.

Can RFE be used for classification?

Incidentally, scikit-learn is also called sklearn, so if you see the two terms, they mean the same thing. RFE can be used to handle problems presented by the two models listed below: Classification: Classification predicts the class of selected data points.

Is RFE a wrapper method?

Recursive Feature Elimination(RFE) is the Wrapper method, i.e., it can ta. This algorithm fits a model and determines how significant features explain the variation in the dataset. Once the feature importance has been determined, it then removes those less important features one at a time in each iteration.

What is Gridsearch in Sklearn?

GridSearchCV is a function that comes in Scikit-learn's(or SK-learn) model_selection package.So an important point here to note is that we need to have the Scikit learn library installed on the computer. This function helps to loop through predefined hyperparameters and fit your estimator (model) on your training set.


1 Answers

The best way to do this would be to nest the RFECV inside the random search, using the method from this SO answer. Some example code, based on the question code and the SO answer mentioned above:

from sklearn.datasets import make_classification
from sklearn.feature_selection import RFECV
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint as sp_randint

# Build a classification task using 5 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=5, n_redundant=2, n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0)

grid = {"estimator__max_depth": [3, None],
        "estimator__min_samples_split": sp_randint(1, 11),
        "estimator__min_samples_leaf": sp_randint(1, 11),
        "estimator__bootstrap": [True, False],
        "estimator__criterion": ["gini", "entropy"]}

estimator = RandomForestClassifier()
selector = RFECV(estimator, step=1, cv=4)
clf = RandomizedSearchCV(selector, param_distributions=grid, cv=7)
clf.fit(X, y)
print(clf.grid_scores_)
print(clf.best_estimator_.n_features_)
like image 163
hugo Avatar answered Oct 03 '22 00:10

hugo