Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping column names to random forest feature importances

Tags:

python

pandas

I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I've managed to create a plot that shows the importances and uses the original variable names as labels but right now it's ordering the variable names in the order they were in the dataset (and not by order of importance). How do I order them in order of feature importance? Thanks!

enter image description here

My code is:

importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
         axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(x_dummies.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
   color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()
like image 357
yogz123 Avatar asked Jan 27 '17 18:01

yogz123


People also ask

How are feature Importances calculated in random forest?

Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value the more important the feature.

What does importance mean in random forest?

After training a random forest, it is natural to ask which variables have the most predictive power. Variables with high importance are drivers of the outcome and their values have a significant impact on the outcome values.

What is feature importance in machine learning?

3. What Is Feature Importance in Machine Learning? Feature (variable) importance indicates how much each feature contributes to the model prediction. Basically, it determines the degree of usefulness of a specific variable for a current model and prediction.


1 Answers

A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:

import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object

feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
    feats[feature] = importance #add the name/value pair 

importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)
like image 168
Sam Avatar answered Oct 06 '22 07:10

Sam