In the generated decision tree regression model, there is an MSE attribute when using graphviz
to view the tree structure. I need to obtain the MSE of each leaf node, and carry out subsequent operations according to the MSE. However, after reading the document, I can't find the method to provide for output MSE. Other attributes such as feature name, sample number, prediction value, etc. All have are corresponding methods:
With help(sklearn.tree._tree.Tree)
, I can see that most of the attributes have some methods to output the value, but I don't see anything about MSE.
Help on class Tree in module sklearn.tree._tree
MSE is the average squared difference between the actual data values and where the data point would be on the proposed line. The tree runs an algorithm that finds the line that results in the smallest MSE.
There is a way to measure the accuracy of a regression task. That is to transform it into a classification task. The first approach is to make the model output prediction interval instead of a number. This is especially possible with decision trees, but it's better to use Quantile Decision Trees.
Friedman mse, mse, mae. the descriptions provided by sklearn are: The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error.
max_depth: This determines the maximum depth of the tree. In our case, we use a depth of two to make our decision tree. The default value is set to none. This will often result in over-fitted decision trees.
Nice question. You need tree_reg.tree_.impurity
.
tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)
extracted_MSEs = tree_reg.tree_.impurity # The Hidden magic is HERE
for idx, MSE in enumerate(tree_reg.tree_.impurity):
print("Node {} has MSE {}".format(idx,MSE))
Node 0 has MSE 86.873403833
Node 1 has MSE 40.3211827171
Node 2 has MSE 25.6934820064
Node 3 has MSE 19.0053469592
Node 4 has MSE 74.6839429717
Node 5 has MSE 38.3057346817
Node 6 has MSE 39.6709615385
boston
dataset with visual output:import pandas as pd
import numpy as np
from sklearn import ensemble, model_selection, metrics, datasets, tree
import graphviz
house_prices = datasets.load_boston()
X_train, X_test, y_train, y_test = model_selection.train_test_split(
pd.DataFrame(house_prices.data, columns=house_prices.feature_names),
pd.Series(house_prices.target, name="med_price"),
test_size=0.20, random_state=42)
tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)
extracted_MSEs = tree_reg.tree_.impurity # YOU NEED THIS
print(extracted_MSEs)
#[86.87340383 40.32118272 25.69348201 19.00534696 74.68394297 38.30573468 39.67096154]
# Compare visually
dot_data = tree.export_graphviz(tree_reg, out_file=None, feature_names=X_train.columns)
graph = graphviz.Source(dot_data)
#this will create an boston.pdf file with the rule path
graph.render("boston")
Compare MSE values with visual Output:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With