Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Forest with GridSearchCV - Error on param_grid

Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()". I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code:

from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, confusion_matrix from sklearn.pipeline import Pipeline   #Classifier Pipeline pipeline = Pipeline([     ('tfidf', TfidfVectorizer()),     ('classifier', RandomForestClassifier()) ]) # Params for classifier params = {"max_depth": [3, None],               "max_features": [1, 3, 10],               "min_samples_split": [1, 3, 10],               "min_samples_leaf": [1, 3, 10],               # "bootstrap": [True, False],               "criterion": ["gini", "entropy"]}  # Grid Search Execute rf_grid = GridSearchCV(estimator=pipeline , param_grid=params) #cv=10 rf_detector = rf_grid.fit(X_train, Y_train) print(rf_grid.grid_scores_) 

I can't figure out why the error is showing. The same btw is occurring when I run a decision tree with GridSearchCV. (Scikit-learn 0.17)

like image 392
OAK Avatar asked Jan 19 '16 23:01

OAK


People also ask

What is Param_grid?

ParameterGrid(param_grid)[source] Grid of parameters with a discrete number of values for each. Can be used to iterate over parameter value combinations with the Python built-in function iter. The order of the generated parameter combinations is deterministic.

How does grid search work with cross validation?

Grid Search CV:Grid Search CV tries all combinations of parameters grid for a model and returns with the best set of parameters having the best performance score. This can also serve as a disadvantage, as training the model of each combination of parameters increases the time complexity.

What is grid search method?

Grid search is a process that searches exhaustively through a manually specified subset of the hyperparameter space of the targeted algorithm. Random search, on the other hand, selects a value for each hyperparameter independently using a probability distribution.


1 Answers

You have to assign the parameters to the named step in the pipeline. In your case classifier. Try prepending classifier__ to the parameter name. Sample pipeline

params = {"classifier__max_depth": [3, None],               "classifier__max_features": [1, 3, 10],               "classifier__min_samples_split": [1, 3, 10],               "classifier__min_samples_leaf": [1, 3, 10],               # "bootstrap": [True, False],               "classifier__criterion": ["gini", "entropy"]} 
like image 171
Kevin Avatar answered Sep 17 '22 12:09

Kevin