Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the roles of min_sample_split and min_sample_leaf in RandomForestClassifier? [duplicate]

I am working on fitting a RandomForestClassifier and came across two parameters: min_sample_split and min_sample_leaf.

Do I need to set both min_sample_split and min_sample_leaf?

I think I just need one of them since one is effectively half of the other. Am I correct in my understanding?

like image 367
AjayC Avatar asked Aug 31 '25 21:08

AjayC


1 Answers

So basically min_sample_split is the minimum no. of sample required for a split. For instance, if min_sample_split = 6 and there are 4 samples in the node, then the split will not happen (regardless of entropy).

min_sample_leaf on the other hand is basically the minimum no. of sample required to be a leaf node. For example, if a node contains 5 samples, it can be split into two leaf nodes of size 2 and 3 respectively. Now suppose you have min_sample_leaf = 3, then the split will not occur, because the minimum leaf size if 3, and you can't have a new node with only 2 samples.

You can take a look at this and this for further reading.

Update : the difference in behaviour of RandomForest and GradientBoostClassifier is attributed largely to the way how they train themselves(gradient boosting is an ensemble of sequential classifiers), you can read more about it here to understand the internal working of gradient boosting

like image 77
Gambit1614 Avatar answered Sep 03 '25 11:09

Gambit1614