I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation, (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) there is no way of accessing it.
If this isn't possible, is there a way of accessing the tree depth from a Decision Tree model?
Any help would be appreciated. Thank you.
n_estimators = len(forest. estimators_) for good measure. This answer is incorrect, this tells you the the maximum allowed depth of each tree in the forest, not the actual depth. So for example a random forest trained with max_depth=10 will return: [10, 10, 10, ...]
There is no theoretical calculation of the best depth of a decision tree to the best of my knowledge. So here is what you do: Choose a number of tree depths to start a for loop (try to cover whole area so try small ones and very big ones as well) Inside a for loop divide your dataset to train/validation (e.g. 70%/30%)
Generally you want as many trees as will improve your model. The depth of the tree should be enough to split each node to your desired number of observations. There has been some work that says best depth is 5-8 splits. It is, of course, problem and data dependent.
How decision trees are created is going to be covered in a later article, because here we are more focused on the implementation of the decision tree in the Sklearn library of Python. The decision tree is a white-box model. We can easily understand any particular condition of the model which results in either true or false.
Scikit-learn offers a more efficient implementation for the construction of decision trees. A naive implementation (as above) would recompute the class label histograms (for classification) or the means (for regression) at for each new split point along a given feature.
How Do You Install Scikit-Learn in Python? Installing Scikit-Learn can be done using either the pip package manager or the conda package manager. Simply write the code below into your command line editor or terminal and let the package manager handle the installation for you:
Scikit learn Classification Report Support In this section, we will learn about how Scikit learn classification works in Python. A classification is a form of data analysis that extracts models describing important data classes. Classification is a bunch of different classes and sorting these classes into different categories.
Each instance of RandomForestClassifier
has an estimators_
attribute, which is a list of DecisionTreeClassifier
instances. The documentation shows that an instance of DecisionTreeClassifier
has a tree_
attribute, which is an instance of the (undocumented, I believe) Tree
class. Some exploration in the interpreter shows that each Tree
instance has a max_depth
parameter which appears to be what you're looking for -- again, it's undocumented.
In any case, if forest
is your instance of RandomForestClassifier
, then:
>>> [estimator.tree_.max_depth for estimator in forest.estimators_] [9, 10, 9, 11, 9, 9, 11, 7, 13, 10]
should do the trick.
Each estimator also has a get_depth()
method than can be used to retrieve the same value with briefer syntax:
>>> [estimator.get_depth() for estimator in forest.estimators_] [9, 10, 9, 11, 9, 9, 11, 7, 13, 10]
To avoid mixup, it should be noted that there is an attribute of each estimator (and not each estimator's tree_
) called max depth
which returns the setting of the parameter rather than the depth of the actual tree. How estimator.get_depth()
, estimator.tree_.max_depth
, and estimator.max_depth
relate to each other is clarified in the example below:
from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(n_estimators=3, random_state=4, max_depth=6) iris = load_iris() clf.fit(iris['data'], iris['target']) [(est.get_depth(), est.tree_.max_depth, est.max_depth) for est in clf.estimators_]
Out:
[(6, 6, 6), (3, 3, 6), (4, 4, 6)]
Setting max depth to the default value None
would allow the first tree to expand to depth 7 and the output would be:
[(7, 7, None), (3, 3, None), (4, 4, None)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With