Scikit Decistion Tree Visualization: meaning of error value

Question

I am a beginner in machine learning and am experimenting with decision trees. I am looking at this visualization of a decision tree http://scikit-learn.org/dev/_images/iris.svg and wondering at what the error value signifies . Is it the Gini Index or the Information gain or what ?. Would also appreciate what it intuitively means.

Peter Prettenhofer · Accepted Answer

In this concrete example, the "error" of a node is the Gini Index of all examples that reached that node.

In general, the "error" of a node depends on the concrete impurity criterion (e.g. gini or entropy for classification and mean squared error for regression).

Intuitively you can think of both impurity criteria (gini and entropy) as a measure how homogeneous a multi set is. A multi set is homogeneous if it contains mostly elements of one type (this is also called "pure" thus the name "impurity criterion"). In our case the elements of the multi set are the class labels that reach the corresponding node. When we split a node we want that the resulting partitions are pure - meaning that the classes are well separated (a partition contains mostly instances of one class).

In the case of criterion="entropy" and binary classification an error of 1.0 means that there is an equal number of positive and negative examples in the node (the most in-homogeneous multi set).

You can access the tree data structure that underlies a DecisionTreeClassifier or DecisionTreeRegressor via its tree_ attribute which holds an on object of the extension type sklearn.tree._tree.Tree. This object represents the tree as a series of parallel numpy arrays. The array init_error hold the initial error of each node; best_error holds sum of the errors of the two partitions if the node is a splitting node. See the class documentation in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L45 for more details.

Scikit Decistion Tree Visualization: meaning of error value

Tags:

machine-learning

scikit-learn

vkmv

1 Answers

Peter Prettenhofer

Recent Activity

Donate For Us

Scikit Decistion Tree Visualization: meaning of error value

Tags:

machine-learning

scikit-learn

vkmv

1 Answers

Peter Prettenhofer

Related questions

Recent Activity

Donate For Us