What is the output of clf.tree_.feature?

Tags:

I observed that scikit-learn clf.tree_.feature occasional return negative values. For example -2. As far as I understand clf.tree_.feature is supposed to return sequential order of the features. In case we have array of feature names ['feature_one', 'feature_two', 'feature_three'], then -2 would refer to feature_two. I am surprised with usage of negative index. In would make more sense to refer to feature_two by index 1. (-2 is reference convenient for human digestion, not for machine processing). Am I reading it correctly?

Update: Here is an example:

def leaf_ordering():
    X = np.genfromtxt('X.csv', delimiter=',')
    Y = np.genfromtxt('Y.csv',delimiter=',')
    dt = DecisionTreeClassifier(min_samples_leaf=10, random_state=99)
    dt.fit(X, Y)
    print(dt.tree_.feature)

Here are the files X and Y

Here is the output:

    [ 8  9 -2 -2  9  4 -2  9  8 -2 -2  0  0  9  9  8 -2 -2  9 -2 -2  6 -2 -2 -2
  2 -2  9  8  6  9 -2 -2 -2  8  9 -2  9  6 -2 -2 -2  6 -2 -2  9 -2  6 -2 -2
  2 -2 -2]

352

asked Sep 26 '16 16:09

user1700890

1 Answers

By reading the Cython source code for the tree generator we see that the -2's are just dummy values for the leaf nodes's feature split attribute.

Line 63

TREE_UNDEFINED = -2

Line 359

if is_leaf:
    # Node is not expandable; set node as leaf
    node.left_child = _TREE_LEAF
    node.right_child = _TREE_LEAF
    node.feature = _TREE_UNDEFINED
    node.threshold = _TREE_UNDEFINED

117

answered Oct 11 '22 18:10

absolutelyNoWarranty

Related questions
                            
                                Why are LASSO in sklearn (python) and matlab statistical package different?
                            
                                How to explain feature importance after one-hot encode used for decision tree
                            
                                Scikit-Learn: Avoiding Data Leakage During Cross-Validation
                            
                                I got the warning "UserWarning: One or more of the test scores are non-finite" when revising a toy scikit-learn gridsearchCV example
                            
                                Libsvm precomputed kernels
                            
                                Is Apache Spark less accurate than Scikit Learn?
                            
                                Resampling in scikit-learn and/or pandas
                            
                                Use a metric after a classifier in a Pipeline
                            
                                How do you visualize a ward tree from sklearn.cluster.ward_tree?
                            
                                How to get the first canonical correlation from sklearn's CCA module?
                            
                                Spark Multi Label classification
                            
                                Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)
                            
                                Setting feature weights for KNN
                            
                                Should binary features be one-hot encoded?
                            
                                Can I add outlier detection and removal to Scikit learn Pipeline?
                            
                                PCA memory error in Sklearn: Alternative Dim Reduction?
                            
                                MiniBatchKMeans Parameters
                            
                                sklearn: User defined cross validation for time series data
                            
                                StratifiedKFold vs StratifiedShuffleSplit vs StratifiedKFold + Shuffle
                            
                                Evaluating Logistic regression with cross validation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the output of clf.tree_.feature?

Tags:

scikit-learn

decision-tree

user1700890

People also ask

1 Answers

absolutelyNoWarranty

Recent Activity

Donate For Us