How to get feature importance in xgboost?

Tags:

python

xgboost

I'm using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code is:

dtrain = xgb.DMatrix(X, label=Y)
watchlist = [(dtrain, 'train')]
param = {'max_depth': 6, 'learning_rate': 0.03}
num_round = 200
bst = xgb.train(param, dtrain, num_round, watchlist)

So is there any mistake in my train? How to get feature importance in xgboost?

758

asked Jun 04 '16 08:06

modkzs

3 Answers

In your code you can get feature importance for each feature in dict form:

bst.get_score(importance_type='gain')

>>{'ftr_col1': 77.21064539577829,
   'ftr_col2': 10.28690566363971,
   'ftr_col3': 24.225014841466294,
   'ftr_col4': 11.234086283060112}

Explanation: The train() API's method get_score() is defined as:

get_score(fmap='', importance_type='weight')

fmap (str (optional)) – The name of feature map file.
importance_type
- ‘weight’ - the number of times a feature is used to split the data across all trees.
- ‘gain’ - the average gain across all splits the feature is used in.
- ‘cover’ - the average coverage across all splits the feature is used in.
- ‘total_gain’ - the total gain across all splits the feature is used in.
- ‘total_cover’ - the total coverage across all splits the feature is used in.

https://xgboost.readthedocs.io/en/latest/python/python_api.html

answered Oct 09 '22 11:10

MLKing

Get the table containing scores and feature names, and then plot it.

feature_important = model.get_booster().get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.nlargest(40, columns="score").plot(kind='barh', figsize = (20,10)) ## plot top 40 features

For example:

enter image description here

answered Oct 09 '22 09:10

Catbuilts

Using sklearn API and XGBoost >= 0.81:

clf.get_booster().get_score(importance_type="gain")

regr.get_booster().get_score(importance_type="gain")

For this to work correctly, when you call regr.fit (or clf.fit), X must be a pandas.DataFrame.

answered Oct 09 '22 11:10

Sesquipedalism

Related questions
                            
                                python pandas replacing strings in dataframe with numbers
                            
                                Fine control over the font size in Seaborn plots for academic papers
                            
                                Python Pandas Group by date using datetime data
                            
                                Run multiple python scripts concurrently
                            
                                Determine whether a key is present in a dictionary [duplicate]
                            
                                Time difference in seconds from numpy.timedelta64
                            
                                Expanding English language contractions in Python
                            
                                Matplotlib Plot Lines with Colors Through Colormap
                            
                                Fillna in multiple columns in place in Python Pandas
                            
                                How do I automatically install missing python modules? [duplicate]
                            
                                Initialize list with same bool value
                            
                                Escaping chars in Python and sqlite
                            
                                Efficient way to unnest (explode) multiple list columns in a pandas DataFrame
                            
                                2d array of zeros
                            
                                Second y-axis time series seaborn
                            
                                how to do bitwise exclusive or of two strings in python?
                            
                                A system independent way using python to get the root directory/drive on which python is installed
                            
                                dropping trailing '.0' from floats
                            
                                How do you determine a processing time in Python?
                            
                                About char b prefix in Python3.4.1 client connect to redis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With