How is the feature score(/importance) in the XGBoost package calculated?

1 Answers

This is a metric that simply sums up how many times each feature is split on. It is analogous to the Frequency metric in the R version.https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

It is about as basic a feature importance metric as you can get.

i.e. How many times was this variable split on?

The code for this method shows it is simply adding of the presence of a given feature in all the trees.

[here..https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953][1]

def get_fscore(self, fmap=''):     """Get feature importance of each feature.     Parameters     ----------     fmap: str (optional)        The name of feature map file     """     trees = self.get_dump(fmap)  ## dump all the trees to text     fmap = {}                         for tree in trees:              ## loop through the trees         for line in tree.split('\n'):     # text processing             arr = line.split('[')             if len(arr) == 1:             # text processing                  continue             fid = arr[1].split(']')[0]    # text processing             fid = fid.split('<')[0]       # split on the greater/less(find variable name)              if fid not in fmap:  # if the feature id hasn't been seen yet                 fmap[fid] = 1    # add it             else:                 fmap[fid] += 1   # else increment it     return fmap                  # return the fmap, which has the counts of each time a  variable was split on

answered Oct 03 '22 17:10

T. Scharf

Related questions
                            
                                How to limit python traceback to specific files
                            
                                Kmeans without knowing the number of clusters? [duplicate]
                            
                                Relative imports with unittest in Python
                            
                                Detect if a text image is upside down
                            
                                Given the name of a Python package, what is the name of the module to import?
                            
                                numerically stable way to multiply log probability matrices in numpy
                            
                                How to obtain the same font(-style, -size etc.) in matplotlib output as in latex output?
                            
                                Name of a function returning a generator
                            
                                Scrapy Python Set up User Agent
                            
                                Necessity of explicit cursor.close()
                            
                                "Too many indexers" with DataFrame.loc
                            
                                Airbnb Airflow vs Apache Nifi [closed]
                            
                                Does get_or_create() have to save right away? (Django)
                            
                                Commit in git only if tests pass
                            
                                Why does pandas apply calculate twice
                            
                                How to use gettext with python >3.6 f-strings
                            
                                Nodejs: Where or How to write complicated business logic?
                            
                                Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array
                            
                                Cyclic module dependencies and relative imports in Python
                            
                                Pip install forked github-repo

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is the feature score(/importance) in the XGBoost package calculated?

Tags:

python

r

classification

feature-selection

xgboost

ishido

People also ask

1 Answers

T. Scharf

Recent Activity

Donate For Us