The sklearn DecisionTreeClassifier has a attribute called "splitter" , it is set to "best" by default, what does setting it to "best" or "random" do? I couldn't find enough information from the official documentation.
min_samples_split specifies the minimum number of samples required to split an internal node, while min_samples_leaf specifies the minimum number of samples required to be at a leaf node. For instance, if min_samples_split = 5 , and there are 7 samples at an internal node, then the split is allowed.
#3) Gini Index The maximum reduction in impurity or max Gini index is selected as the best attribute for splitting.
The process of splitting a single node into many nodes is known as splitting. A leaf node, also known as a terminal node, is a node that does not break into other nodes. A branch, sometimes known as a sub-tree, is a section of a decision tree. Splitting is not the only concept that is diametrically opposite it.
A low number in min_sample_split and min_sample_leaf allows the model to differentiate between samples. A low number in min_sample_split , for example, allows the decision tree to split 2 samples into different groups, while the min_sample_leaf dictates how many samples minimum can be in each "classification."
Short ans:
RandomSplitter initiates a **random split on each chosen feature**, whereas BestSplitter goes through **all possible splits on each chosen feature**.
Longer explanation:
This is clear when you go thru _splitter.pyx.
In fact, the "random" parameter is used for implementing the extra randomized tree in sklearn. In a nutshell, this parameter means that the splitting algorithm will traverse all features but only randomly choose the splitting point between the maximum feature value and the minimum feature value. If you are interested in the algorithm's details, you can refer to this paper [1]. Moreover, if you are interested in the detailed implementation of this algorithm, you can refer to this page.
[1]. P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With