Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-Learn Custom Decision Tree Leaf Types

Tags:

scikit-learn

Is it possible to define custom decision tree leaf types when using sci-kit learn?

I would like to train Random Forests using more complicated leaves, such as leaves containing linear regressors or gaussians. This would probably require defining a custom leaf type and implementing a new split criterion. Is that possible?

Thank you.

like image 572
Mageek Avatar asked Dec 10 '15 18:12

Mageek


People also ask

What are the 3 types of nodes used in the decision trees?

There are three different types of nodes: chance nodes, decision nodes, and end nodes. A chance node, represented by a circle, shows the probabilities of certain results. A decision node, represented by a square, shows a decision to be made, and an end node shows the final outcome of a decision path.

How can you decide the leaf node in decision tree?

In order to make a decision tree, we need to calculate the impurity of each split, and when the purity is 100%, we make it as a leaf node. To check the impurity of feature 2 and feature 3 we will take the help for Entropy formula.

How many leaf nodes are there in decision tree?

As decision trees are binary, they only have two decision nodes that split into left or right branches.

What is the type of decision tree in sklearn?

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.


1 Answers

This is possible to do, but not very sensible.

Decision-tree in sklearn is written in Cython (a hybrid of C++ and Python) and uses an predetermined list of Cython split criteria. This makes sklearn trees very fast, but not easily customizable.

If you write your own leaves and splitter in pure Python, you will have to integrate them with sklearn Cython code. This is possible, but might be long and hard. And in the end, you will have slow code, because it will call Python from C++ at each node. Thus, it would be cheaper for you to write the tree-building algorithm from scratch.

If you are very serious in your endeavor, you can write the leaves and splitter in Cython, to make it easy integrable with scikit-learn and as fast. But this solution will not be so custom, because you will not be able to add another leaf model from Python.

If you want a quick try of such a model, you could use M5 algorithm in Weka.

And if you ask my opinion, I hardly see why you may want to add more complex models into each leaf of a Random Forest - it is already complex enough.

like image 188
David Dale Avatar answered Nov 19 '22 19:11

David Dale