Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I plot graphviz decision tree with inverse-transform (actual) values?

I am using graphviz to plot the a classification decision tree.

before to fit the Features I use "preprocessing.StandardScaler()" to scale them

therefore when I plot the decision tree I get it plotted based on the "transformed values"

is there a way to "inverse_trasform" the classifier before to plot it so that the decision tree plots the actual values at the nodes and not the transformed ones?

yes, I have tried scale.inverse_transform(rf_clf) .... but of course don' twork...

Import the dataset from sklearn.datasets

iris = datasets.load_iris()

Create a data frame from the dictionary

species = [iris.target_names[x] for x in iris.target]
iris = pd.DataFrame(iris['data'], columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
iris['Species'] = species

converting to arrays

Features = np.array(iris[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])

levels = {'setosa':0, 'versicolor':1, 'virginica':2}
Labels =  np.array([levels[x] for x in iris['Species']])

splitting

nr.seed(1115)
indx = range(Features.shape[0])
indx = ms.train_test_split(indx, test_size = 100)
X_train = Features[indx[0],:]
y_train = np.ravel(Labels[indx[0]])
X_test = Features[indx[1],:]
y_test = np.ravel(Labels[indx[1]])

scaling:

scale = preprocessing.StandardScaler()
scale.fit(X_train)
X_train = scale.transform(X_train)

fitting the classifier

rf_clf = tree.DecisionTreeClassifier() ###simple TREE
rf_clf.fit(X_train, y_train)*

plotting the decision tree with graphviz:

dot_data = tree.export_graphviz(rf_clf, out_file=None, 

             feature_names=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'], 
             class_names=['setosa', 'versicolor', 'virginica'], 

                 filled=True, rounded=True,  
                 special_characters=True)

print(dot_data)

graph = graphviz.Source(dot_data)  
graph 

the results of the first node is "Petal_width<= 0.53" and the second node is "petal lenght <= -0.788" that is a negative figure of a real quantity.

I would prefer to have the tree bearing the real value in Inches...

like image 361
CRAZYDATA Avatar asked Dec 03 '25 16:12

CRAZYDATA


1 Answers

You could traverse the tree and set the value of the node threshold yourself.

If you consider this example for traversing the tree: https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py

Where it says print("%snode=%s test node: go to node %s if X[:, %s] <= %s else to node %s."... You could rewrite the threshold and use the scaler's inverse_transform function for the feature under test.

transformed = np.empty(X_train.shape[1])
transformed[:] = np.nan
transformed[feature[i]] = threshold[i]
threshold[i] = scale.inverse_transform(transformed)[feature[i]]

Your generated dot file will contain the updated values. You won't be able to use the tree for prediction anymore with the scaled features though.

Note: the value of the threshold isn't exactly the same as without scaling, I'm not sure if the scaler should have an influence on the threshold like that.

like image 107
TomVW Avatar answered Dec 06 '25 06:12

TomVW