How is feature importance calculated for GradientBoostingClassifier

Question

I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier. It makes feature importance score available in feature_importances_. How are these feature importances calculated?

I'd like to understand what algorithm scikit-learn is using, to help me understand how to interpret those numbers. The algorithm isn't listed in the documentation.

D.W. · Accepted Answer

This is documented elsewhere in the scikit-learn documentation. In particular, here is how it works:

For each tree, we calculate the feature importance of a feature F as the fraction of samples that will traverse a node that splits based on feature F (see here). Then, we average those numbers across all trees (as described here).

It is not described exactly how scikit-learn estimates the fraction of nodes that will traverse a tree node that splits on feature F.

The interpretation: scores will be in the range [0,1]. Higher scores mean the feature is more important. This is an array with shape (n_features,) whose values are positive and sum to 1.0

How is feature importance calculated for GradientBoostingClassifier

Tags:

python

machine-learning

scikit-learn

feature-selection

D.W.

1 Answers

D.W.

Recent Activity

Donate For Us

How is feature importance calculated for GradientBoostingClassifier

Tags:

python

machine-learning

scikit-learn

feature-selection

D.W.

1 Answers

D.W.

Related questions

Recent Activity

Donate For Us