Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the decision tree structure is only binary tree for sklearn DecisionTreeClassifier?

As we can see from the sklearn document here, or from my experiment, all the tree structure of DecisionTreeClassifier is binary tree. Either the criterion is gini or entropy, each DecisionTreeClassifier node can only has 0 or 1 or 2 child node.

But from the decision tree introduction slide(page 3), each node of theoretic decision tree can has more than 2 child node.

So my question is why the decision tree structure is only binary tree (each DecisionTreeClassifier node can only has 1 or 2 child node.) for sklearn DecisionTreeClassifier? Can we get the tree structure with more than 2 child node for DecisionTreeClassifier?

like image 494
ybdesire Avatar asked Aug 13 '16 09:08

ybdesire


1 Answers

It is because sklearn's approach is to work with numerical features, not categorical, when you have numerical feature, it is relatively hard to build a nice splitting rule which can have arbitrary number of thresholds (which is required to produce more than 2 children). For categorical features, on the other hand (used in the slides provided), another possible option is to have as many children as possible values. Both approach have its own problems (categorical approach makes it nearly intracktable when you have plenty of possible values) and numerical requires particular features encoding (one hot for categorical, which efficiently means that you can still express the same tree, but instead of having "species" with 3 children [dog, cat, human] you will have deeper tree with decisions: [dog, not dog], [not dog but cat, not dog, not cat but human]).

So the short answer is no, you cannot achieve more than 2 children with this implementation, however this is not something truly limiting in general.

like image 186
lejlot Avatar answered Sep 28 '22 05:09

lejlot