I have made a decision tree using sklearn, here, under the SciKit learn DL package, viz. sklearn.tree.DecisionTreeClassifier().fit(x,y)
.
How do I get the gini indices for all possible nodes at each step? graphviz
only gives me the gini index of the node with the lowest gini index, ie the node used for split.
For example, the image below (from graphviz
) tells me the gini score of the Pclass_lowVMid right index which is 0.408, but not the gini index of the Pclass_lower or Sex_male at that step. I just know the Gini index of Pclass_lower and Sex_male must be greater than (0.408*0.7 + 0) but that's it.
Using export_graphviz
shows impurity for all nodes, at least in version 0.20.1
.
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from graphviz import Source
data = load_iris()
X, y = data.data, data.target
clf = DecisionTreeClassifier(max_depth=2, random_state=42)
clf.fit(X, y)
graph = Source(export_graphviz(clf, out_file=None, feature_names=data.feature_names))
graph.format = 'png'
graph.render('dt', view=True);
The impurity values for all nodes are also accessible in the impurity
attribute of the tree
.
clf.tree_.impurity
array([0.66666667, 0. , 0.5 , 0.16803841, 0.04253308])
Gini index of pclass node = gini index of left node * (no. of samples in left node/ no. samples at left node + no. of samples at right node) + gini index of right node * ( no. of samples in left node/ no. samples at left node + no. of samples at right node) So here it will be
Gini index of pclass = 0 + .408 *(7/10) = 0.2856
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With