Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualizing scikit-learn/ sklearn multi-output decision tree regression in png or pdf

this is the first question I'm posting on stackoverflow so I apologize for any mishaps in layout and so on (advice welcome). Your help is much appreciated!

I'm trying to visualize the output of DecisionTreeRegressor with multiple outputs (as described in http://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression_multioutput.html#example-tree-plot-tree-regression-multioutput-py) in png or pdf format using pydot.

The code I tried looks like this:

...
dtreg = tree.DecisionTreeRegressor(max_depth=3)
dtreg.fit(x,y)

tree.export_graphviz(dtreg, out_file='tree.dot') #print dotfile

dot_data = StringIO()
tree.export_graphviz(dtreg, out_file=dot_data)
print dot_data.getvalue()
pydot.graph_from_dot_data(dot_data.getvalue()).write_pdf("pydot_try.pdf") 

Writing the pdf gives the following errors:

pydot.InvocationException: Program terminated with status: 1. stderr follows: Warning: /tmp/tmpAy7d59:7: string ran past end of line Error: /tmp/tmpAy7d59:8: syntax error near line 8 context: >>> [ <<< 0.20938667] Warning: /tmp/tmpAy7d59:18: string ran past end of line Warning: /tmp/tmpAy7d59:20: string ran past end of line

and so on with more "string ran past end of line" errors.

I've never worked with .dot before, but I suspect there might be a problem with the multi-output format. For example, part of the tree looks like this:

digraph Tree {
0 [label="X[0] <= 56.0000\nmse = 0.0149315126135\nsamples = 41", shape="box"] ;
1 [label="X[0] <= 40.0000\nmse = 0.0137536911947\nsamples = 25", shape="box"] ;
0 -> 1 ;
2 [label="X[0] <= 24.0000\nmse = 0.0152142545276\nsamples = 21", shape="box"] ;
1 -> 2 ;
3 [label="mse = 0.0140\nsamples = 15\nvalue = [[ 0.83384667]
 [ 0.20938667]
 [ 0.08511333]
 [ 0.04234667]
 [ 0.08158   ]
 [ 0.17948667]
 [ 0.03616   ]
 [ 0.00995333]
 [ 0.99529333]
 [ 0.13715333]
 [ 0.10294667]
 [ 0.06632667]]", shape="box"] ;
2 -> 3 ;
4 [label="mse = 0.0170\nsamples = 6\nvalue = [[ 0.69588333]
 [ 0.20275   ]
 [ 0.0953    ]
 [ 0.0436    ]
 [ 0.1216    ]
 [ 0.17248333]
 [ 0.04393333]
 [ 0.01178333]
 [ 0.99913333]
 [ 0.12348333]
 [ 0.10838333]
 [ 0.06973333]]", shape="box"] ;
2 -> 4 ;
}

I don't know how to solve this, because that's just the output I get from DecisionTreeRegressor.

I also tried converting the dot file:

dot -Tpng tree.dot -o tree.png

But this gives the same errors (string ran past end of line) I also tried visualizing tree.dot using xdot and that gave the same error.

like image 369
CSquare Avatar asked Nov 10 '22 20:11

CSquare


1 Answers

Follow the instructions below to view the decision tree.

•Using sklearn, we can export the tree in a dot format. A ‘dot’ format file is a text file.

•‘Dot’ file can be converted to an image file using ‘graphviz’ utility

•Download ‘graphviz.msi’ from the website - http://www.graphviz.org/Download_windows.php

•Ensure that ‘\graphviz\bin’ is added to the ‘path’ in environment variables.

A ‘dot’ file can be extracted using sklearn module with the help of following commands

from sklearn import tree
tree.export_graphviz(clf,out_file='tree.dot')

In the command prompt execute the following to convert the ‘.dot’ file to ’.png’ file.

 dot -Tpng tree.dot -o tree.png
like image 178
Praveen Gupta Sanka Avatar answered Nov 14 '22 23:11

Praveen Gupta Sanka