Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is feature importance calculated for GradientBoostingClassifier

I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier. It makes feature importance score available in feature_importances_. How are these feature importances calculated?

I'd like to understand what algorithm scikit-learn is using, to help me understand how to interpret those numbers. The algorithm isn't listed in the documentation.

like image 275
D.W. Avatar asked Jan 03 '23 23:01

D.W.


1 Answers

This is documented elsewhere in the scikit-learn documentation. In particular, here is how it works:

For each tree, we calculate the feature importance of a feature F as the fraction of samples that will traverse a node that splits based on feature F (see here). Then, we average those numbers across all trees (as described here).

It is not described exactly how scikit-learn estimates the fraction of nodes that will traverse a tree node that splits on feature F.

The interpretation: scores will be in the range [0,1]. Higher scores mean the feature is more important. This is an array with shape (n_features,) whose values are positive and sum to 1.0

like image 61
D.W. Avatar answered Jan 10 '23 06:01

D.W.