Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to store decision tree

I've tried several different methods, some of which I found on here which include making a Node class and nested dictionaries, but I can't seem to get them to work.

My code currently takes in several lines of DNA (a,t,g,c) and stores then as a numpy array. It then finds the attribute that gives the most gain and splits the data into 4 new numpy arrays (dependent upon an a, t, g, or c being present at the attribute).

I'm unable to make a recursive function which can build the tree. I'm quite new to python and programming itself, so please describe with detail what I should do.

Thanks for any help

like image 369
user3312146 Avatar asked Feb 15 '14 00:02

user3312146


People also ask

Is there a decision tree template in Powerpoint?

From the Project Management menu, go to the Decision Tree tab. A collection of templates and the option to create a new decision tree will appear in the menu. To make a Decision Tree from scratch, click the large + sign.


1 Answers

If you want to implement a decision tree from scratch I recommend you to build your tree using classes. A tree is composed of nodes, where one node contains nodes recursively and leafs are terminal nodes. For the case of a binary tree, these classes can be something like:

class Node(object):
    def __init__(self):
        self.split_variable = None
        self.left_child = None
        self.right_child = None

    def get_name(self):
        return 'Node'

class Leaf(object):
    def __init__(self):
        self.value = None

    def get_name(self):
        return 'Leaf'

For the Node class: 'split_variable' will contain the variable name used in the split ie: [a,t,g,c] and 'left_child' and 'right_child' will be new instances of Node or Leaf. The True/False presence of that variable will be mapped into the left/right children. (In case of a regression tree you'll need to add a fourth variable to the Node class 'split_value' and map less/more than this value into the left/right children).

For the Leaf class: 'value' contains the assigned value of the tree class variable (ie majority in case of a discrete variable or mean in the case of a continuous one).

To complete your implementation you'll need functions to walk your tree evaluating and/or visualising it. These functions will be recursively called to complete walking through the tree. Here is where you can make use of the get_name() functions of the classes, to differentiate nodes from leafs. To implement this part it really depends on how you store your data, I suggest you to use pandas DataFrames which are like tables. A sample evaluate function could be (pseudocode):

def evaluate_tree(your_data, node):
    if your_data[node.split_variable]:
        if node.left_child.get_name() == 'Node':
            evaluate_tree(your_data, node.left_child)
        elif node.left_child.get_name() == 'Leaf':
            return node.left_child.value
    else:
        if node.right_child.get_name() == 'Node':
            evaluate_tree(your_data, node.right_child)
        elif node.right_child.get_name() == 'Leaf':
            return node.right_child.value

Good luck!

like image 199
prl900 Avatar answered Sep 29 '22 13:09

prl900