Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain information gain from a scikit-learn DecisionTreeClassifier?

I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree. What I need is the information gain for each feature at the root level, when it is about to split the root node.

like image 715
Jagat Avatar asked May 06 '13 06:05

Jagat


1 Answers

You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute DecisionTreeClassifier.tree_.best_error[i] holds the entropy of the i-th node splitting on feature DecisionTreeClassifier.tree_.feature[i]. If you want the entropy of all examples that reach the i-th node look at DecisionTreeClassifier.tree_.init_error[i].

For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64

If you want to access the entropy for each feature (at a certain split node) - you need to modify the function find_best_split https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713

like image 98
Peter Prettenhofer Avatar answered Sep 19 '22 18:09

Peter Prettenhofer