Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the MSE of the node in the DecisionTreeRegressor of scikit-learn?

In the generated decision tree regression model, there is an MSE attribute when using graphviz to view the tree structure. I need to obtain the MSE of each leaf node, and carry out subsequent operations according to the MSE. However, after reading the document, I can't find the method to provide for output MSE. Other attributes such as feature name, sample number, prediction value, etc. All have are corresponding methods:

Tree structure

With help(sklearn.tree._tree.Tree), I can see that most of the attributes have some methods to output the value, but I don't see anything about MSE.

Help on class Tree in module sklearn.tree._tree Help on class Tree in module sklearn.tree._tree

like image 583
Cosmic Roach Avatar asked Dec 17 '19 13:12

Cosmic Roach


People also ask

What is MSE in decision tree?

MSE is the average squared difference between the actual data values and where the data point would be on the proposed line. The tree runs an algorithm that finds the line that results in the smallest MSE.

How do you check the accuracy of a decision tree Regressor?

There is a way to measure the accuracy of a regression task. That is to transform it into a classification task. The first approach is to make the model output prediction interval instead of a number. This is especially possible with decision trees, but it's better to use Quantile Decision Trees.

What is Friedman MSE?

Friedman mse, mse, mae. the descriptions provided by sklearn are: The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error.

What is Max_depth in the decision tree?

max_depth: This determines the maximum depth of the tree. In our case, we use a depth of two to make our decision tree. The default value is set to none. This will often result in over-fitted decision trees.


1 Answers

Nice question. You need tree_reg.tree_.impurity.

Short answer:

tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)

extracted_MSEs = tree_reg.tree_.impurity # The Hidden magic is HERE

for idx, MSE in enumerate(tree_reg.tree_.impurity):
    print("Node {} has MSE {}".format(idx,MSE))

Node 0 has MSE 86.873403833
Node 1 has MSE 40.3211827171
Node 2 has MSE 25.6934820064
Node 3 has MSE 19.0053469592
Node 4 has MSE 74.6839429717
Node 5 has MSE 38.3057346817
Node 6 has MSE 39.6709615385

Long answer using the boston dataset with visual output:

import pandas as pd
import numpy as np
from sklearn import ensemble, model_selection, metrics, datasets, tree
import graphviz

house_prices = datasets.load_boston()

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    pd.DataFrame(house_prices.data, columns=house_prices.feature_names),
    pd.Series(house_prices.target, name="med_price"),
    test_size=0.20, random_state=42)

tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)

extracted_MSEs = tree_reg.tree_.impurity # YOU NEED THIS 
print(extracted_MSEs)
#[86.87340383 40.32118272 25.69348201 19.00534696 74.68394297 38.30573468 39.67096154]

# Compare visually
dot_data = tree.export_graphviz(tree_reg, out_file=None, feature_names=X_train.columns)
graph = graphviz.Source(dot_data)

#this will create an boston.pdf file with the rule path
graph.render("boston")

Compare MSE values with visual Output:

enter image description here

like image 186
seralouk Avatar answered Sep 29 '22 00:09

seralouk