Possible to modify/prune learned trees in scikit-learn?

Tags:

It is possible to access tree parameters in sklearn with

tree.tree_.children_left
tree.tree_.children_right
tree.tree_.threshold
tree.tree_.feature

and so on

However, trying to write to these variables raises a not writable exception

Is there any way to modify the learned tree, or bypass the AttributeError not writable?

269

asked Aug 17 '16 16:08

Eric Thibodeau Laufer

1 Answers

The attributes are both arrays of int that can not be overwritten. You can still modify the elements of these arrays. That will not lighten the data.

children_left : array of int, shape [node_count]
    children_left[i] holds the node id of the left child of node i.
    For leaves, children_left[i] == TREE_LEAF. Otherwise,
    children_left[i] > i. This child handles the case where
    X[:, feature[i]] <= threshold[i].

children_right : array of int, shape [node_count]
    children_right[i] holds the node id of the right child of node i.
    For leaves, children_right[i] == TREE_LEAF. Otherwise,
    children_right[i] > i. This child handles the case where
    X[:, feature[i]] > threshold[i].

feature : array of int, shape [node_count]
    feature[i] holds the feature to split on, for the internal node i.

threshold : array of double, shape [node_count]
    threshold[i] holds the threshold for the internal node i.

To prune a DecisionTree by the number of observations in nodes, I use this function. You need to know that the TREE_LEAF constant is equal to -1.

def prune(decisiontree, min_samples_leaf = 1):
    if decisiontree.min_samples_leaf >= min_samples_leaf:
        raise Exception('Tree already more pruned')
    else:
        decisiontree.min_samples_leaf = min_samples_leaf
        tree = decisiontree.tree_
        for i in range(tree.node_count):
            n_samples = tree.n_node_samples[i]
            if n_samples <= min_samples_leaf:
                tree.children_left[i]=-1
                tree.children_right[i]=-1

Here is an example which produces graphviz output before and after:

[from sklearn.tree import DecisionTreeRegressor as DTR
from sklearn.datasets import load_diabetes
from sklearn.tree import export_graphviz as export

bunch = load_diabetes()
data = bunch.data
target = bunch.target

dtr = DTR(max_depth = 4)
dtr.fit(data,target)

export(decision_tree=dtr.tree_, out_file='before.dot')
prune(dtr, min_samples_leaf = 100)
export(decision_tree=dtr.tree_, out_file='after.dot')][1]

answered Oct 13 '22 12:10

S. Lundy

Related questions
                            
                                Pandas.read_excel: Accessing the home directory
                            
                                python, shapely: How to determine if two polygons cross each other, while allowing their edges to overlap
                            
                                How to filter a pandas series with a datetime index on the quarter and year
                            
                                Adapting binary stacking example to multiclass
                            
                                What is the standard docstring for a django model metaclass?
                            
                                When/How does an anonymous file object close?
                            
                                Split a pandas column of dictionaries into multiple columns
                            
                                Returning a PDF from S3 in Flask
                            
                                Middleware in flask
                            
                                Vectorizing a Nested Loop
                            
                                Restricted set operations on python dictionary key views
                            
                                Formatted string literals in Python 3.6 with tuples
                            
                                Pandas Divide dataframe by index values
                            
                                pandas merge dataframes on closest timestamp
                            
                                How can I remove all non-alphanumeric characters from a string, except for '#', with regex?
                            
                                How many iterations a needed to train tensorflow with the entire MNIST data set (60000 images)?
                            
                                Changing numpy structured array dtype names and formats
                            
                                unordered_map<int, vector<float>> equivalent in Python
                            
                                Unicode in the standard TensorFlow format
                            
                                python ) google spread sheet : update api does not work with 403

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Possible to modify/prune learned trees in scikit-learn?

Tags:

python

machine-learning

scikit-learn

decision-tree

random-forest

Eric Thibodeau Laufer

People also ask

1 Answers

S. Lundy

Recent Activity

Donate For Us