Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the output of clf.tree_.feature?

I observed that scikit-learn clf.tree_.feature occasional return negative values. For example -2. As far as I understand clf.tree_.feature is supposed to return sequential order of the features. In case we have array of feature names ['feature_one', 'feature_two', 'feature_three'], then -2 would refer to feature_two. I am surprised with usage of negative index. In would make more sense to refer to feature_two by index 1. (-2 is reference convenient for human digestion, not for machine processing). Am I reading it correctly?

Update: Here is an example:

def leaf_ordering():
    X = np.genfromtxt('X.csv', delimiter=',')
    Y = np.genfromtxt('Y.csv',delimiter=',')
    dt = DecisionTreeClassifier(min_samples_leaf=10, random_state=99)
    dt.fit(X, Y)
    print(dt.tree_.feature)

Here are the files X and Y

Here is the output:

    [ 8  9 -2 -2  9  4 -2  9  8 -2 -2  0  0  9  9  8 -2 -2  9 -2 -2  6 -2 -2 -2
  2 -2  9  8  6  9 -2 -2 -2  8  9 -2  9  6 -2 -2 -2  6 -2 -2  9 -2  6 -2 -2
  2 -2 -2]
like image 352
user1700890 Avatar asked Sep 26 '16 16:09

user1700890


People also ask

What is value in Sklearn decision tree?

Value is how the samples to test for information gain are split up. So at the root node, 32561 samples are divided into 24720 and 7841 samples each.

What is Max_leaf_nodes?

max_leaf_nodes – Maximum number of leaf nodes a decision tree can have. max_features – Maximum number of features that are taken into the account for splitting each node.

What is Max_depth in decision tree?

max_depth: This determines the maximum depth of the tree. In our case, we use a depth of two to make our decision tree. The default value is set to none. This will often result in over-fitted decision trees.


1 Answers

By reading the Cython source code for the tree generator we see that the -2's are just dummy values for the leaf nodes's feature split attribute.

Line 63

TREE_UNDEFINED = -2

Line 359

if is_leaf:
    # Node is not expandable; set node as leaf
    node.left_child = _TREE_LEAF
    node.right_child = _TREE_LEAF
    node.feature = _TREE_UNDEFINED
    node.threshold = _TREE_UNDEFINED
like image 117
absolutelyNoWarranty Avatar answered Oct 11 '22 18:10

absolutelyNoWarranty