Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access trees and nodes from LightGBM model

In sci-kit learn, it's possible to access the entire tree structure, that is, each node of the tree. This allows to explore the attributes used at each split of the tree and which values are used for the test

The binary tree structure has 5 nodes and has the following tree structure:
node=0 test node: go to node 1 if X[:, 3] <= 0.800000011920929 else to node 2.
    node=1 leaf node.
    node=2 test node: go to node 3 if X[:, 2] <= 4.950000047683716 else to node 4.
            node=3 leaf node.
            node=4 leaf node.

Rules used to predict sample 0:
decision id node 0 : (X_test[0, 3] (= 2.4) > 0.800000011920929)
decision id node 2 : (X_test[0, 2] (= 5.1) > 4.950000047683716)

For the Random Forest, you can obtain the same information by looping across all the decision trees

for tree in model.estimators_:
    # extract info from tree

Can the same information be extracted from a LightGBM model? That is, can you access: a) every tree and b) every node of a tree?

like image 634
Titus Pullo Avatar asked Nov 13 '18 12:11

Titus Pullo


2 Answers

Yes, this is possible with

model._Booster.dump_model()["tree_info"]

which is for example used in lightgbm.plot_tree(). I must admit though that I haven't used it myself and don't know the details about the returned structure.

like image 174
Joel Avatar answered Nov 14 '22 07:11

Joel


There is also model._Booster.num_trees(). It returns a Pandas DataFrame containing information about all nodes of all trees.

The list of columns contained in that DataFrame and their meanings can be found in the official docs.

like image 40
BrunoF Avatar answered Nov 14 '22 08:11

BrunoF