Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()". I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code:
from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, confusion_matrix from sklearn.pipeline import Pipeline #Classifier Pipeline pipeline = Pipeline([ ('tfidf', TfidfVectorizer()), ('classifier', RandomForestClassifier()) ]) # Params for classifier params = {"max_depth": [3, None], "max_features": [1, 3, 10], "min_samples_split": [1, 3, 10], "min_samples_leaf": [1, 3, 10], # "bootstrap": [True, False], "criterion": ["gini", "entropy"]} # Grid Search Execute rf_grid = GridSearchCV(estimator=pipeline , param_grid=params) #cv=10 rf_detector = rf_grid.fit(X_train, Y_train) print(rf_grid.grid_scores_)
I can't figure out why the error is showing. The same btw is occurring when I run a decision tree with GridSearchCV. (Scikit-learn 0.17)
ParameterGrid(param_grid)[source] Grid of parameters with a discrete number of values for each. Can be used to iterate over parameter value combinations with the Python built-in function iter. The order of the generated parameter combinations is deterministic.
Grid Search CV:Grid Search CV tries all combinations of parameters grid for a model and returns with the best set of parameters having the best performance score. This can also serve as a disadvantage, as training the model of each combination of parameters increases the time complexity.
Grid search is a process that searches exhaustively through a manually specified subset of the hyperparameter space of the targeted algorithm. Random search, on the other hand, selects a value for each hyperparameter independently using a probability distribution.
You have to assign the parameters to the named step in the pipeline. In your case classifier
. Try prepending classifier__
to the parameter name. Sample pipeline
params = {"classifier__max_depth": [3, None], "classifier__max_features": [1, 3, 10], "classifier__min_samples_split": [1, 3, 10], "classifier__min_samples_leaf": [1, 3, 10], # "bootstrap": [True, False], "classifier__criterion": ["gini", "entropy"]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With