In sci-kit learn, it's possible to access the entire tree structure, that is, each node of the tree. This allows to explore the attributes used at each split of the tree and which values are used for the test
The binary tree structure has 5 nodes and has the following tree structure:
node=0 test node: go to node 1 if X[:, 3] <= 0.800000011920929 else to node 2.
node=1 leaf node.
node=2 test node: go to node 3 if X[:, 2] <= 4.950000047683716 else to node 4.
node=3 leaf node.
node=4 leaf node.
Rules used to predict sample 0:
decision id node 0 : (X_test[0, 3] (= 2.4) > 0.800000011920929)
decision id node 2 : (X_test[0, 2] (= 5.1) > 4.950000047683716)
For the Random Forest, you can obtain the same information by looping across all the decision trees
for tree in model.estimators_:
# extract info from tree
Can the same information be extracted from a LightGBM model? That is, can you access: a) every tree and b) every node of a tree?
Yes, this is possible with
model._Booster.dump_model()["tree_info"]
which is for example used in lightgbm.plot_tree()
. I must admit though that I haven't used it myself and don't know the details about the returned structure.
There is also model._Booster.num_trees()
. It returns a Pandas DataFrame containing information about all nodes of all trees.
The list of columns contained in that DataFrame and their meanings can be found in the official docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With