I'm trying to use scikit learn in python to do a couple different classifier problems (RF, GBM, etc). In addition to building models and making predictions, I'd like to see variable importance. I know there is a way to get the importances <pre class="prettyprint"><code>importances = clf.feature_importances_ print(importances) </code></pre> but how do I get something more refined that has the importance connected to the variable name (ie <code>summary(gbm)</code> in R or <code>varImp(randomForest)</code> in R) especially if it's a categorical variable with multiple levels?

The variable importance (or feature importance) is calculated for all the features that you are fitting your model to. This pseudo code gives you an idea of how variable names and importance can be related: <pre class="prettyprint lang-py prettyprint-override"><code>import pandas as pd train = pd.read_csv("train.csv") cols = ['hour', 'season', 'holiday', 'workingday', 'weather', 'temp', 'windspeed'] clf = YourClassifiers() clf.fit(train[cols], train.targets) # targets/labels print len(clf.feature_importances_) print len(cols) </code></pre> You will see that the lengths of the two lists being printed are the same - you can essentially map the lists together or manipulate them how you wish. If you'd like to show variable importance nicely in a plot, you could use this: <pre class="prettyprint lang-py prettyprint-override"><code>import numpy as np import matplotlib.pyplot as plt plt.figure(figsize=(6 * 1.618, 6)) index = np.arange(len(cols)) bar_width = 0.35 plt.bar(index, clf.feature_importances_, color='black', alpha=0.5) plt.xlabel('features') plt.ylabel('importance') plt.title('Feature importance') plt.xticks(index + bar_width, cols) plt.tight_layout() plt.show() </code></pre> If you don't want to use this method (meaning that you are fitting all columns, not just selected few as set in <code>cols</code> variable), then you could get the column/feature/variable names of your data with <code>train.columns.values</code> (and then map this list together with variable importance list or manipulate in some other way).

Python - Scikit find variable importance for categorical variables

Tags:

python

r

scikit-learn

random-forest

gbm

I'm trying to use scikit learn in python to do a couple different classifier problems (RF, GBM, etc). In addition to building models and making predictions, I'd like to see variable importance. I know there is a way to get the importances

importances = clf.feature_importances_
print(importances)

but how do I get something more refined that has the importance connected to the variable name (ie summary(gbm) in R or varImp(randomForest) in R) especially if it's a categorical variable with multiple levels?

835

asked Mar 19 '15 23:03

screechOwl

1 Answers

The variable importance (or feature importance) is calculated for all the features that you are fitting your model to. This pseudo code gives you an idea of how variable names and importance can be related:

import pandas as pd

train = pd.read_csv("train.csv")
cols = ['hour', 'season', 'holiday', 'workingday', 'weather', 'temp', 'windspeed']
clf = YourClassifiers()
clf.fit(train[cols], train.targets) # targets/labels

print len(clf.feature_importances_)
print len(cols)

You will see that the lengths of the two lists being printed are the same - you can essentially map the lists together or manipulate them how you wish. If you'd like to show variable importance nicely in a plot, you could use this:

import numpy as np
import matplotlib.pyplot as plt

plt.figure(figsize=(6 * 1.618, 6))
index = np.arange(len(cols))
bar_width = 0.35
plt.bar(index, clf.feature_importances_, color='black', alpha=0.5)
plt.xlabel('features')
plt.ylabel('importance')
plt.title('Feature importance')
plt.xticks(index + bar_width, cols)
plt.tight_layout()
plt.show()

If you don't want to use this method (meaning that you are fitting all columns, not just selected few as set in cols variable), then you could get the column/feature/variable names of your data with train.columns.values (and then map this list together with variable importance list or manipulate in some other way).

190

answered Oct 26 '22 02:10

kasparg

Related questions
                            
                                Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)
                            
                                Mapping from a node's name to its index and vice versa in networkx
                            
                                Table(Model) Inheritance with Flask SQLAlchemy
                            
                                How to retrieve function call argument values using libclang
                            
                                Why does Fraction use __new__ instead of __init__?
                            
                                Pandas, groupby and finding maximum in groups, returning value and count
                            
                                Scikit-learn custom score function needs values from dataset other than X and y
                            
                                NOT NULL constraint failed error
                            
                                Getting the basic form of the english word
                            
                                cx_freeze - including my own modules?
                            
                                Is it possible to monitor a list (or mutable sequence) for when a member of the list is modified?
                            
                                Matplotlib custom projection: How to transform points
                            
                                Is there a Python equivalent to dereferencing in Perl?
                            
                                How do I find the largest integer less than x?
                            
                                How to declare build-time dependencies without breaking other packages?
                            
                                Change Time Unit with Kernprof
                            
                                What happens to a Celery Worker's scheduled (eta) tasks when it shuts down?
                            
                                Can I have logging.ini file without root logger?
                            
                                python module __init__ function
                            
                                How can I get Sqlalchemy to preserve column order in the sql it generates?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With