Plot Feature Importance with feature names

Tags:

In R there are pre-built functions to plot feature importance of Random Forest model. But in python such method seems to be missing. I search for a method in matplotlib.

model.feature_importances gives me following:

array([  2.32421835e-03,   7.21472336e-04,   2.70491223e-03,
         3.34521084e-03,   4.19443238e-03,   1.50108737e-03,
         3.29160540e-03,   4.82320256e-01,   3.14117333e-03])

Then using following plotting function:

>> pyplot.bar(range(len(model.feature_importances_)), model.feature_importances_)
>> pyplot.show()

I get a barplot but I would like to get barplot with labels while importance showing horizontally in a sorted fashion. I am also exploring seaborn and was not able to find a method.

401

asked Jun 13 '17 02:06

add-semi-colons

2 Answers

Quick answer for data scientists that ain't got no time to waste:

Load the feature importances into a pandas series indexed by your column names, then use its plot method. For a classifier model trained using X:

feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')

Slightly more detailed answer with a full example:

Assuming you trained your model with data contained in a pandas dataframe, this is fairly painless if you load the feature importance into a panda's series, then you can leverage its indexing to get the variable names displayed easily. The plot argument kind='barh' gives us a horizontal bar chart, but you could easily substitute this argument for kind='bar' for a traditional bar chart with the feature names along the x-axis if you prefer.

nlargest(n) is a pandas Series method which will return a subset of the series with the largest n values. This is useful if you've got lots of features in your model and you only want to plot the most important.

A quick complete example using the classic Kaggle Titanic dataset...

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
%matplotlib inline            # don't forget this if you're using jupyter!

X = pd.read_csv("titanic_train.csv")
X = X[['Pclass', 'Age', 'Fare', 'Parch', 'SibSp', 'Survived']].dropna()
y = X.pop('Survived')

model = RandomForestClassifier()
model.fit(X, y)

(pd.Series(model.feature_importances_, index=X.columns)
   .nlargest(4)
   .plot(kind='barh'))        # some method chaining, because it's sexy!

Which will give you this:

sklearn random forest feature importances

answered Sep 19 '22 10:09

fordy

Not exactly sure what you are looking for. Derived a example from here. As mentioned in the comment: you can change indices to a list of labels at line plt.yticks(range(X.shape[1]), indices) if you want to customize feature labels.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier

# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Build a forest and compute the feature importances
forest = ExtraTreesClassifier(n_estimators=250,
                              random_state=0)

forest.fit(X, y)
importances = forest.feature_importances_
std = np.std([tree.feature_importances_ for tree in forest.estimators_],
             axis=0)
indices = np.argsort(importances)

# Plot the feature importances of the forest
plt.figure()
plt.title("Feature importances")
plt.barh(range(X.shape[1]), importances[indices],
       color="r", xerr=std[indices], align="center")
# If you want to define your own labels,
# change indices to a list of labels on the following line.
plt.yticks(range(X.shape[1]), indices)
plt.ylim([-1, X.shape[1]])
plt.show()

enter image description here

answered Sep 21 '22 10:09

Y. Luo

Related questions
                            
                                How to debug indentation errors in python
                            
                                How to read eml file in python?
                            
                                Configuring Spark to work with Jupyter Notebook and Anaconda
                            
                                Pip SSL Error on Windows
                            
                                Basic Flask app not running (TypeError: required field "type_ignores" missing from Module)
                            
                                Testing in Python - how to use assertRaises in testing using unittest? [duplicate]
                            
                                Convert fraction to decimal in Python
                            
                                Please explain this algorithm to get all permutations of a String
                            
                                tensorflow: how to rotate an image for data augmentation?
                            
                                Convert txt to csv python script
                            
                                Updating Django - error: 'No module named migration'
                            
                                Using SQLite in a Python program
                            
                                How to add legend to imshow() in matplotlib
                            
                                Create empty matrix Python
                            
                                How to pause a pylab figure until a key is pressed or mouse is clicked?
                            
                                How to save python screen output to a text file
                            
                                How to remove all integer values from a list in python
                            
                                Plot all pandas dataframe columns separately
                            
                                Counting recursion in a python program! [duplicate]
                            
                                Python: can't assign to literal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Plot Feature Importance with feature names

Tags:

python

matplotlib

seaborn

random-forest

add-semi-colons

People also ask

2 Answers

fordy

Y. Luo

Recent Activity

Donate For Us