Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert regression tree output to pandas table

Tags:

python

pandas

This code fits a regression tree in python. I want to convert this text based output to a table format.

Have looked into this ( Convert a decision tree to a table ) however the given solution doesn't work.

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree

dataset = np.array( 
[['Asset Flip', 100, 1000], 
['Text Based', 500, 3000], 
['Visual Novel', 1500, 5000], 
['2D Pixel Art', 3500, 8000], 
['2D Vector Art', 5000, 6500], 
['Strategy', 6000, 7000], 
['First Person Shooter', 8000, 15000], 
['Simulator', 9500, 20000], 
['Racing', 12000, 21000], 
['RPG', 14000, 25000], 
['Sandbox', 15500, 27000], 
['Open-World', 16500, 30000], 
['MMOFPS', 25000, 52000], 
['MMORPG', 30000, 80000] 
]) 

X = dataset[:, 1:2].astype(int)

y = dataset[:, 2].astype(int)  

regressor = DecisionTreeRegressor(random_state = 0) 

regressor.fit(X, y) 

text_rule = tree.export_text(regressor )

print(text_rule)

Output I am getting is like this

print(text_rule)
|--- feature_0 <= 20750.00
|   |--- feature_0 <= 7000.00
|   |   |--- feature_0 <= 1000.00
|   |   |   |--- feature_0 <= 300.00
|   |   |   |   |--- value: [1000.00]
|   |   |   |--- feature_0 >  300.00
|   |   |   |   |--- value: [3000.00]
|   |   |--- feature_0 >  1000.00
|   |   |   |--- feature_0 <= 2500.00
|   |   |   |   |--- value: [5000.00]
|   |   |   |--- feature_0 >  2500.00
|   |   |   |   |--- feature_0 <= 4250.00
|   |   |   |   |   |--- value: [8000.00]
|   |   |   |   |--- feature_0 >  4250.00
|   |   |   |   |   |--- feature_0 <= 5500.00
|   |   |   |   |   |   |--- value: [6500.00]
|   |   |   |   |   |--- feature_0 >  5500.00
|   |   |   |   |   |   |--- value: [7000.00]
|   |--- feature_0 >  7000.00
|   |   |--- feature_0 <= 13000.00
|   |   |   |--- feature_0 <= 8750.00
|   |   |   |   |--- value: [15000.00]
|   |   |   |--- feature_0 >  8750.00
|   |   |   |   |--- feature_0 <= 10750.00
|   |   |   |   |   |--- value: [20000.00]
|   |   |   |   |--- feature_0 >  10750.00
|   |   |   |   |   |--- value: [21000.00]
|   |   |--- feature_0 >  13000.00
|   |   |   |--- feature_0 <= 16000.00
|   |   |   |   |--- feature_0 <= 14750.00
|   |   |   |   |   |--- value: [25000.00]
|   |   |   |   |--- feature_0 >  14750.00
|   |   |   |   |   |--- value: [27000.00]
|   |   |   |--- feature_0 >  16000.00
|   |   |   |   |--- value: [30000.00]
|--- feature_0 >  20750.00
|   |--- feature_0 <= 27500.00
|   |   |--- value: [52000.00]
|   |--- feature_0 >  27500.00
|   |   |--- value: [80000.00]

I want to convert this rule in a pandas table something similar to the following form. How to do this ?

enter image description here

Plot version of the rule is something like this ( for reference ). Please note in table I have showed the left most part of the rule.

enter image description here

like image 301
Soumya Boral Avatar asked Nov 07 '22 05:11

Soumya Boral


1 Answers

Modifying the the code from the linked answer:

import sklearn
import pandas as pd

def tree_to_df(reg_tree, feature_names):
    tree_ = reg_tree.tree_
    feature_name = [
        feature_names[i] if i != sklearn.tree._tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    
    def recurse(node, row, ret):
        if tree_.feature[node] != sklearn.tree._tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            # Add rule to row and search left branch
            row[-1].append(name + " <= " +  str(threshold))
            recurse(tree_.children_left[node], row, ret)
            # Add rule to row and search right branch
            row[-1].append(name + " > " +  str(threshold))
            recurse(tree_.children_right[node], row, ret)
        else:
            # Add output rules and start a new row
            label = tree_.value[node]
            ret.append("return " + str(label[0][0]))
            row.append([])
    
    # Initialize
    rules = [[]]
    vals = []
    
    # Call recursive function with initial values
    recurse(0, rules, vals)
    
    # Convert to table and output
    df = pd.DataFrame(rules).dropna(how='all')
    df['Return'] = pd.Series(vals)
    return df

This will return a pandas dataframe:

                     0                   1                   2                 3          Return
0   feature <= 20750.0   feature <= 7000.0   feature <= 1000.0  feature <= 300.0   return 1000.0
1      feature > 300.0                None                None              None   return 3000.0
2     feature > 1000.0   feature <= 2500.0                None              None   return 5000.0
3     feature > 2500.0   feature <= 4250.0                None              None   return 8000.0
4     feature > 4250.0   feature <= 5500.0                None              None   return 6500.0
5     feature > 5500.0                None                None              None   return 7000.0
6     feature > 7000.0  feature <= 13000.0   feature <= 8750.0              None  return 15000.0
7     feature > 8750.0  feature <= 10750.0                None              None  return 20000.0
8    feature > 10750.0                None                None              None  return 21000.0
9    feature > 13000.0  feature <= 16000.0  feature <= 14750.0              None  return 25000.0
10   feature > 14750.0                None                None              None  return 27000.0
11   feature > 16000.0                None                None              None  return 30000.0
12   feature > 20750.0  feature <= 27500.0                None              None  return 52000.0
13   feature > 27500.0                None                None              None  return 80000.0
like image 148
quizzical_panini Avatar answered Nov 14 '22 23:11

quizzical_panini