Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using <code>feature_importances_</code> much like sklearn's random forest. However, for some reason, I keep getting this error: <code>AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'</code> My code snippet is below: <pre class="prettyprint"><code>from sklearn import datasets import xgboost as xg iris = datasets.load_iris() X = iris.data Y = iris.target Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1 X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y xgb = xg.XGBClassifier() fit = xgb.fit(X, Y) fit.feature_importances_ </code></pre> It seems that you can compute feature importance using the <code>Booster</code> object by calling the <code>get_fscore</code> attribute. The only reason I'm using <code>XGBClassifier</code> over <code>Booster</code> is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you. <pre class="prettyprint"><code>def get_xgb_imp(xgb, feat_names): from numpy import array imp_vals = xgb.booster().get_fscore() imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))} total = array(imp_dict.values()).sum() return {k:v/total for k,v in imp_dict.items()} >>> import numpy as np >>> from xgboost import XGBClassifier >>> >>> feat_names = ['var1','var2','var3','var4','var5'] >>> np.random.seed(1) >>> X = np.random.rand(100,5) >>> y = np.random.rand(100).round() >>> xgb = XGBClassifier(n_estimators=10) >>> xgb = xgb.fit(X,y) >>> >>> get_xgb_imp(xgb,feat_names) {'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735} </code></pre>

Feature Importance with XGBClassifier

Tags:

python

scikit-learn

xgboost

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_ much like sklearn's random forest.

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_

It seems that you can compute feature importance using the Booster object by calling the get_fscore attribute. The only reason I'm using XGBClassifier over Booster is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

895

asked Jul 05 '16 21:07

Minh Mai

2 Answers

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}


>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> 
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> 
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

131

answered Sep 23 '22 14:09

David

For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')

from xgboost import plot_importance
plot_importance(xgb_model, )

answered Sep 23 '22 14:09

rosefun

Related questions
                            
                                django: Fat models and skinny controllers?
                            
                                creating pandas data frame from multiple files
                            
                                Efficient standard basis vector with numpy
                            
                                What does __init__ method return in python
                            
                                pandas: set values with (row, col) indices
                            
                                Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently
                            
                                pandas dataframe, copy by value
                            
                                flask : how to architect the project with multiple apps?
                            
                                Scraping ajax pages using python
                            
                                How to delete all entities for NDB Model in Google App Engine for python?
                            
                                Cannot complete Flask-Migration
                            
                                numpy np.apply_along_axis function speed up?
                            
                                How to make matplotlib graphs look professionally done like this? [closed]
                            
                                Creating a RESTful API using Flask?
                            
                                'IOError: [Errno 5] Input/output error' while using SMBus for analog reading through RPi
                            
                                Difference between list(dict) and dict.keys()?
                            
                                How to enable port 5000 on AWS ubuntu [closed]
                            
                                Nested List to Pandas Dataframe with headers
                            
                                Upload file via sftp with python
                            
                                Unable to import a module from Python notebook in Jupyter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With