Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force decision tree to split into different classes

I'm doing a decision tree, and I would like to force the algorithm to split the results into different classes after one node. The problem is that in the trees that I get, after evaluating the condition (is X < than a certain value), I get two results of the same class (yes and yes, for example). I want to have "yes" and "no" as results for the evaluation of the node. Here is the example of what I'm getting:

1

This is the code generating the tree and the plot:

clf = tree.DecisionTreeClassifier(max_depth=2)
clf = clf.fit(users_data, users_target)

dot_data = tree.export_graphviz(clf, out_file=None, 
                     feature_names= feature_names,  
                     class_names= target_names,  
                     filled=True, rounded=True,  
                     special_characters=True) 

graph = graphviz.Source(dot_data)  
graph

I expect to find "YES" and "NO" classes after the nodes. Now, I'm getting the same classes in the lasts levels after the respective conditions.

Thanks!

like image 464
Matias Eiletz Avatar asked Nov 02 '25 15:11

Matias Eiletz


1 Answers

As is, you model indeed does look like it doesn't offer any further discrimination between the first and the second level nodes; so, if you are certain that this is (kind of) optimal for your case, you can simply ask it to stop there using max_depth=1 instead of 2:

clf = tree.DecisionTreeClassifier(max_depth=1)

Keep in mind however that in reality this can be far from optimal; have a look at the tree for the iris dataset from the scikit-learn docs:

enter image description here

where you can see that, further down the tree levels, nodes with class=versicolor emerge from what look like "pure" nodes of class=virginica (and vice versa).

So, before deciding to prune the tree beforehand to max_depth=1, you might want to check if leaving it to grow further (i.e. by not specifying the max_depth argument, thus leaving it in its default value of None), might be better for your case.

Everything depends on why exactly you are doing this (i.e. your business case): if it is an exploratory one, you might very well stop with max_depth=1; if it is a predictive one, you should consider which configuration maximizes an appropriate metric (most probably here, the accuracy).

like image 186
desertnaut Avatar answered Nov 04 '25 06:11

desertnaut



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!