I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree. What I need is the information gain for each feature at the root level, when it is about to split the root node.
You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute DecisionTreeClassifier.tree_.best_error[i]
holds the entropy of the i-th node splitting on feature DecisionTreeClassifier.tree_.feature[i]
. If you want the entropy of all examples that reach the i-th node look at DecisionTreeClassifier.tree_.init_error[i]
.
For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64
If you want to access the entropy for each feature (at a certain split node) - you need to modify the function find_best_split
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With