Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RandomForestRegressor and feature_importances_ error

I am struggling to pull out the feature importances from my RandomForestRegressor, I get an:

AttributeError: 'GridSearchCV' object has no attribute 'feature_importances_'.

Anyone know why there is no attribute? According to documentation there should exist this attribute?

The full code:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

#Running a RandomForestRegressor GridSearchCV to tune the model.
parameter_candidates = {
    'n_estimators' : [650, 700, 750, 800],
    'min_samples_leaf' : [1, 2, 3],
    'max_depth' : [10, 11, 12],
    'min_samples_split' : [2, 3, 4, 5, 6]
}

RFR_regr = RandomForestRegressor()
CV_RFR_regr = GridSearchCV(estimator=RFR_regr, param_grid=parameter_candidates, n_jobs=5, verbose=2)
CV_RFR_regr.fit(X_train, y_train)

#Predict with testing set
y_pred = CV_RFR_regr.predict(X_test)

#Extract feature importances
importances = CV_RFR_regr.feature_importances_
like image 781
Svarto Avatar asked Nov 04 '17 13:11

Svarto


People also ask

What is Randomforestregressor score?

A random forest regressor. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

What does random forest feature importance mean?

June 29, 2020 by Piotr Płoński Random forest. The feature importance (variable importance) describes which features are relevant. It can help with better understanding of the solved problem and sometimes lead to model improvements by employing the feature selection.

Does feature importance add up to 1?

It shall be noted that the feature importance values do not sum up to one, since they are not normalized (you can normalize them if you'd like, by dividing these by the sum of importance values).

What is N_estimators in Randomforestregressor?

n_estimators : This is the number of trees you want to build before taking the maximum voting or averages of predictions. Higher number of trees give you better performance but makes your code slower.


1 Answers

You are trying to use the attribute on the GridSearchCV object. Its not present there. What you actually need to do is to access the estimator on which the grid search is done.

Access the attribute by :

importances = CV_RFR_regr.best_estimator_.feature_importances_
like image 153
Vivek Kumar Avatar answered Oct 14 '22 13:10

Vivek Kumar