Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you access tree depth in Python's scikit-learn?

I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation, (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) there is no way of accessing it.

If this isn't possible, is there a way of accessing the tree depth from a Decision Tree model?

Any help would be appreciated. Thank you.

like image 659
iltp38 Avatar asked Dec 11 '15 00:12

iltp38


People also ask

How do I get depth in random forest Python?

n_estimators = len(forest. estimators_) for good measure. This answer is incorrect, this tells you the the maximum allowed depth of each tree in the forest, not the actual depth. So for example a random forest trained with max_depth=10 will return: [10, 10, 10, ...]

How do you choose the depth of a decision tree?

There is no theoretical calculation of the best depth of a decision tree to the best of my knowledge. So here is what you do: Choose a number of tree depths to start a for loop (try to cover whole area so try small ones and very big ones as well) Inside a for loop divide your dataset to train/validation (e.g. 70%/30%)

How do we select the depth of the trees in random forest?

Generally you want as many trees as will improve your model. The depth of the tree should be enough to split each node to your desired number of observations. There has been some work that says best depth is 5-8 splits. It is, of course, problem and data dependent.

What is a decision tree in Python sklearn?

How decision trees are created is going to be covered in a later article, because here we are more focused on the implementation of the decision tree in the Sklearn library of Python. The decision tree is a white-box model. We can easily understand any particular condition of the model which results in either true or false.

What is the advantage of using scikit-learn for decision trees?

Scikit-learn offers a more efficient implementation for the construction of decision trees. A naive implementation (as above) would recompute the class label histograms (for classification) or the means (for regression) at for each new split point along a given feature.

How do I install scikit-learn in Python?

How Do You Install Scikit-Learn in Python? Installing Scikit-Learn can be done using either the pip package manager or the conda package manager. Simply write the code below into your command line editor or terminal and let the package manager handle the installation for you:

How does scikit learn classification work in Python?

Scikit learn Classification Report Support In this section, we will learn about how Scikit learn classification works in Python. A classification is a form of data analysis that extracts models describing important data classes. Classification is a bunch of different classes and sorting these classes into different categories.


1 Answers

Each instance of RandomForestClassifier has an estimators_ attribute, which is a list of DecisionTreeClassifier instances. The documentation shows that an instance of DecisionTreeClassifier has a tree_ attribute, which is an instance of the (undocumented, I believe) Tree class. Some exploration in the interpreter shows that each Tree instance has a max_depth parameter which appears to be what you're looking for -- again, it's undocumented.

In any case, if forest is your instance of RandomForestClassifier, then:

>>> [estimator.tree_.max_depth for estimator in forest.estimators_] [9, 10, 9, 11, 9, 9, 11, 7, 13, 10] 

should do the trick.

Each estimator also has a get_depth() method than can be used to retrieve the same value with briefer syntax:

>>> [estimator.get_depth() for estimator in forest.estimators_] [9, 10, 9, 11, 9, 9, 11, 7, 13, 10] 

To avoid mixup, it should be noted that there is an attribute of each estimator (and not each estimator's tree_) called max depth which returns the setting of the parameter rather than the depth of the actual tree. How estimator.get_depth(), estimator.tree_.max_depth, and estimator.max_depth relate to each other is clarified in the example below:

from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(n_estimators=3, random_state=4, max_depth=6) iris = load_iris() clf.fit(iris['data'], iris['target']) [(est.get_depth(), est.tree_.max_depth, est.max_depth) for est in clf.estimators_] 

Out:

[(6, 6, 6), (3, 3, 6), (4, 4, 6)] 

Setting max depth to the default value None would allow the first tree to expand to depth 7 and the output would be:

[(7, 7, None), (3, 3, None), (4, 4, None)] 
like image 74
jme Avatar answered Oct 07 '22 16:10

jme