I have trained a random forest model using scikit-learn and now I want to save its tree structures in a text file so I can use it elsewhere. According to this link a tree object consist of a number of parallel arrays each one hold some information about different nodes of the tree (ex. left child, right child, what feature it examines,...) . However there seems to be no information about the class label corresponding to each leaf node! It's even not mentioned in the examples provided in the link above.
Does anyone know where are the class labels stored in the scikit-learn decision tree structure?
All decision trees use np. float32 arrays internally. If training data is not in this format, a copy of the dataset will be made.
Sklearn Module − The Scikit-learn library provides the module name DecisionTreeRegressor for applying decision trees on regression problems.
Each leaf node represents a class. It does not require any domain knowledge. It is easy to comprehend. The learning and classification steps of a decision tree are simple and fast.
Take a look at the docs for sklearn.tree.DecisionTreeClassifier.tree_.value
:
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf.fit(iris.data, iris.target)
print(clf.classes_)
[0, 1, 2]
print(clf.tree_.value)
[[[ 50. 50. 50.]]
[[ 50. 0. 0.]]
[[ 0. 50. 50.]]
[[ 0. 49. 5.]]
[[ 0. 47. 1.]]
[[ 0. 47. 0.]]
[[ 0. 0. 1.]]
[[ 0. 2. 4.]]
[[ 0. 0. 3.]]
[[ 0. 2. 1.]]
[[ 0. 2. 0.]]
[[ 0. 0. 1.]]
[[ 0. 1. 45.]]
[[ 0. 1. 2.]]
[[ 0. 0. 2.]]
[[ 0. 1. 0.]]
[[ 0. 0. 43.]]]
Each row in clf.tree_.value
"contains the constant prediction value of each node," (help(clf.tree_)
) which corresponds index-to-index to clf.classes_
.
See this answer for (barely) more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With