I've tried several different methods, some of which I found on here which include making a Node class and nested dictionaries, but I can't seem to get them to work.
My code currently takes in several lines of DNA (a,t,g,c) and stores then as a numpy array. It then finds the attribute that gives the most gain and splits the data into 4 new numpy arrays (dependent upon an a, t, g, or c being present at the attribute).
I'm unable to make a recursive function which can build the tree. I'm quite new to python and programming itself, so please describe with detail what I should do.
Thanks for any help
From the Project Management menu, go to the Decision Tree tab. A collection of templates and the option to create a new decision tree will appear in the menu. To make a Decision Tree from scratch, click the large + sign.
If you want to implement a decision tree from scratch I recommend you to build your tree using classes. A tree is composed of nodes, where one node contains nodes recursively and leafs are terminal nodes. For the case of a binary tree, these classes can be something like:
class Node(object):
def __init__(self):
self.split_variable = None
self.left_child = None
self.right_child = None
def get_name(self):
return 'Node'
class Leaf(object):
def __init__(self):
self.value = None
def get_name(self):
return 'Leaf'
For the Node class: 'split_variable' will contain the variable name used in the split ie: [a,t,g,c] and 'left_child' and 'right_child' will be new instances of Node or Leaf. The True/False presence of that variable will be mapped into the left/right children. (In case of a regression tree you'll need to add a fourth variable to the Node class 'split_value' and map less/more than this value into the left/right children).
For the Leaf class: 'value' contains the assigned value of the tree class variable (ie majority in case of a discrete variable or mean in the case of a continuous one).
To complete your implementation you'll need functions to walk your tree evaluating and/or visualising it. These functions will be recursively called to complete walking through the tree. Here is where you can make use of the get_name() functions of the classes, to differentiate nodes from leafs. To implement this part it really depends on how you store your data, I suggest you to use pandas DataFrames which are like tables. A sample evaluate function could be (pseudocode):
def evaluate_tree(your_data, node):
if your_data[node.split_variable]:
if node.left_child.get_name() == 'Node':
evaluate_tree(your_data, node.left_child)
elif node.left_child.get_name() == 'Leaf':
return node.left_child.value
else:
if node.right_child.get_name() == 'Node':
evaluate_tree(your_data, node.right_child)
elif node.right_child.get_name() == 'Leaf':
return node.right_child.value
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With