Current node to next node feature combinations in decision tree learning: useful to determine potential interactions?

Tags:

Using some guidance from this scikit-learn tutorial on understanding decision tree structures, I had the idea that perhaps looking at combinations of features occurring between two connected nodes might give some insight as to potential "interaction" terms. That is, by looking at how often a given feature y follows a given feature x, we might be able to determine whether there's some higher order interaction between x and y, versus other variables in the model.

Here is my setup. Basically this object just parses the structure of the tree and makes it easy for us to traverse nodes and determine what's happening at each node.

import numpy as np

class TreeInteractionFinder(object):

    def __init__(
        self,
        model,
        feature_names = None):

        self.model = model
        self.feature_names = feature_names

        self._parse_tree_structure()
        self._node_and_leaf_compute()

    def _parse_tree_structure(self):
        self.n_nodes = self.model.tree_.node_count
        self.children_left = self.model.tree_.children_left
        self.children_right = self.model.tree_.children_right
        self.feature = self.model.tree_.feature
        self.threshold = self.model.tree_.threshold
        self.n_node_samples = self.model.tree_.n_node_samples
        self.predicted_values = self.model.tree_.value

    def _node_and_leaf_compute(self):
        ''' Compute node depth and whether each node is a leaf '''
        node_depth = np.zeros(shape=self.n_nodes, dtype=np.int64)
        is_leaves = np.zeros(shape=self.n_nodes, dtype=bool)
        # Seed is the root node id and its parent depth
        stack = [(0, -1)]
        while stack:
            node_idx, parent_depth = stack.pop()
            node_depth[node_idx] = parent_depth + 1

            # If we have a test (where "test" means decision-test) node
            if self.children_left[node_idx] != self.children_right[node_idx]:
                stack.append((self.children_left[node_idx], parent_depth + 1))
                stack.append((self.children_right[node_idx], parent_depth + 1))
            else:
                is_leaves[node_idx] = True

        self.is_leaves = is_leaves
        self.node_depth = node_depth

Next, I'll train a somewhat-deep tree on some dataset. The Boston housing dataset gave me some interesting results, and thus I've used it in my example:

from sklearn.datasets import load_boston as load_dataset
from sklearn.tree import DecisionTreeRegressor as model

bunch = load_dataset()

X, y = bunch.data, bunch.target
feature_names = bunch.feature_names

model = model(
    max_depth=20,
    min_samples_leaf=2
)

model.fit(X, y)

finder = TreeInteractionFinder(model, feature_names)

from collections import defaultdict
feature_combos = defaultdict(int)

# Traverse the tree fully, counting the occurrences of features at the current and next indices
for idx in range(finder.n_nodes):
    curr_node_is_leaf = finder.is_leaves[idx]
    curr_feature = finder.feature_names[finder.feature[idx]]
    if not curr_node_is_leaf:
        # Test to see if we're at the end of the tree
        try:
            next_idx = finder.feature[idx + 1]
        except IndexError:
            break
        else:
            next_node_is_leaf = finder.is_leaves[next_idx]
            if not next_node_is_leaf:
                next_feature = finder.feature_names[next_idx]
                feature_combos[frozenset({curr_feature, next_feature})] += 1

from pprint import pprint
pprint(sorted(feature_combos.items(), key=lambda x: -x[1]))
pprint(sorted(zip(feature_names, model.feature_importances_), key=lambda x: -x[1]))

Which yields:

$ python3 *py
[(frozenset({'AGE', 'LSTAT'}), 4),
 (frozenset({'RM', 'LSTAT'}), 3),
 (frozenset({'AGE', 'NOX'}), 3),
 (frozenset({'NOX', 'CRIM'}), 3),
 (frozenset({'NOX', 'DIS'}), 3),
 (frozenset({'LSTAT', 'DIS'}), 2),
 (frozenset({'AGE', 'RM'}), 2),
 (frozenset({'AGE', 'DIS'}), 2),
 (frozenset({'TAX', 'DIS'}), 1),
 (frozenset({'RM', 'INDUS'}), 1),
 (frozenset({'PTRATIO'}), 1),
 (frozenset({'NOX', 'PTRATIO'}), 1),
 (frozenset({'LSTAT', 'CRIM'}), 1),
 (frozenset({'RM'}), 1),
 (frozenset({'TAX', 'PTRATIO'}), 1),
 (frozenset({'NOX'}), 1),
 (frozenset({'DIS', 'CRIM'}), 1),
 (frozenset({'AGE', 'PTRATIO'}), 1),
 (frozenset({'AGE', 'CRIM'}), 1),
 (frozenset({'ZN', 'DIS'}), 1),
 (frozenset({'ZN', 'CRIM'}), 1),
 (frozenset({'CRIM', 'PTRATIO'}), 1),
 (frozenset({'RM', 'CRIM'}), 1)]
[('RM', 0.60067090411997),
 ('LSTAT', 0.22148824141475706),
 ('DIS', 0.068263421165279),
 ('CRIM', 0.03893906506019243),
 ('NOX', 0.028695328014265362),
 ('PTRATIO', 0.014211478583574726),
 ('AGE', 0.012467751974477529),
 ('TAX', 0.011821058983765207),
 ('B', 0.002420619208623876),
 ('INDUS', 0.0008323703650693053),
 ('ZN', 0.00018976111002551332),
 ('CHAS', 0.0),
 ('RAD', 0.0)]

After adding the criterion to exclude "next" nodes that are leaves, the results seem to improve.

Now, one combination of features occurring very frequently is frozenset({'AGE', 'LSTAT'}) - that is, the combination of the age of the building as well as the "% lower status of the population" (whatever that means, presumably a measure of low-income rate). From model.feature_importances_, both LSTAT and AGE are relatively important predictors of sale price, which leads me to believe this combination of features AGE * LSTAT might be useful.

Is this even barking up the right tree (pun maybe intended)? Does counting combinations of sequential features in a given tree speak to potential interactions in a model?

823

asked Nov 05 '19 02:11

blacksite

1 Answers

TL;DR : decision tree is not the best tool for analyzing importance of feature combinations.

As any other algorithm, the decision tree (DT) has its weaknesses. The assumption of DT algorithm in its basic form is that the features it works with are uncorrelated. Then growing DT is a process when you choose from the set of all possible questions (decisions) one that split the set of examples in a way that produces maximum gain (accordingly to the chosen loss function, which is usually Gini index or Information gain). If your features are correlated you need either to try to decorrelate them (by applying PCA for example) or throw away some in a smart way (procedure called feature selection), otherwise it may cause either bad generalization or too many small leafs. You can read here more about it.

Another issue with DT is that it was designed to work with categorical data, and we make it work with numerical data by applying binning to the data. So on certain features the shear amount of questions might be much higher, than on other features.

That said, after your DT is ready, you can understand importance of each decision (data being in a certain range of values): the closer the decision is to the root of the tree, the more important it is. So location is also important, and amount of times certain combination of features appears in the tree does not directly indicate importance of that combination. And while some meaningful combinations might surface up, their amount not necessarily will be high enough to stand out of the crowd.

128

answered Sep 25 '22 13:09

igrinis

Related questions
                            
                                In Python PDB, how do I list the source code of a file other than the current file?
                            
                                Lambda calling Lambda - how to access the payload in the second?
                            
                                Matplotlib not working for LInux. Cannot load backend 'TkAgg'
                            
                                How to implement Beholder (Tensorboard plugin) for Keras?
                            
                                How to extract charts/tables/graphs from PDF files using Python?
                            
                                Python - Difference between Sonarcube Vs pylint
                            
                                Matplotlib: How to make scatter plot of arbitrary pyplot figures
                            
                                Selenium Wait for user to click on alert dialog box in python
                            
                                Google BigQuery Python library is 2x as fast as the Node JS library for downloading results
                            
                                HTMLCalendar in Django app: missing arguments
                            
                                Keras predict loop memory leak using tf.data.Dataset but not with a numpy array
                            
                                Disable SameFileError exception in shutil.copy
                            
                                show/hide django admin form field based on selection in another field
                            
                                Which cubic spline method does scipy ndimage use for affine_transform?
                            
                                Meson cannot find pykeepass module, I am certain it is installed
                            
                                pandas explode fails with KeyError: 0
                            
                                How to detect empty park space using morphologyEx and drawContours?
                            
                                SQLAlchemy Nested CTE Query
                            
                                Flask Dependency injection
                            
                                Installing Faiss on Lambda Stack

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Current node to next node feature combinations in decision tree learning: useful to determine potential interactions?

Tags:

python

machine-learning

scikit-learn

decision-tree

blacksite

People also ask

1 Answers

igrinis

Recent Activity

Donate For Us