Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to visualize an sklearn GradientBoostingClassifier?

I've trained a gradient boost classifier, and I would like to visualize it using the graphviz_exporter tool shown here.

When I try it I get:

AttributeError: 'GradientBoostingClassifier' object has no attribute 'tree_'

this is because the graphviz_exporter is meant for decision trees, but I guess there's still a way to visualize it, since the gradient boost classifier must have an underlying decision tree.

Does anybody know how to do that?

like image 877
Carlos Pinzón Avatar asked Jul 07 '17 15:07

Carlos Pinzón


People also ask

Why is gradient boosting better than random forest?

Gradient boosting trees can be more accurate than random forests. Because we train them to correct each other's errors, they're capable of capturing complex patterns in the data. However, if the data are noisy, the boosted trees may overfit and start modeling the noise.

Can gradient boosting be used for multi class classification?

The power of gradient boosting machines comes from the fact that they can be used on more than binary classification problems, they can be used on multi-class classification problems and even regression problems.

What is subsample in gradient boost?

subsamplefloat, default=1.0. The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators . Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.


1 Answers

The attribute estimators contains the underlying decision trees. The following code displays one of the trees of a trained GradientBoostingClassifier. Notice that although the ensemble is a classifier as a whole, each individual tree computes floating point values.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import export_graphviz
import numpy as np

# Ficticuous data
np.random.seed(0)
X = np.random.normal(0,1,(1000, 3))
y = X[:,0]+X[:,1]*X[:,2] > 0

# Classifier
clf = GradientBoostingClassifier(max_depth=3, random_state=0)
clf.fit(X[:600], y[:600])

# Get the tree number 42
sub_tree_42 = clf.estimators_[42, 0]

# Visualization
# Install graphviz: https://www.graphviz.org/download/
from pydotplus import graph_from_dot_data
from IPython.display import Image
dot_data = export_graphviz(
    sub_tree_42,
    out_file=None, filled=True, rounded=True,
    special_characters=True,
    proportion=False, impurity=False, # enable them if you want
)
graph = graph_from_dot_data(dot_data)
Image(graph.create_png())

Tree number 42:

Code output (decision tree image)

like image 121
Carlos Pinzón Avatar answered Sep 16 '22 13:09

Carlos Pinzón