Is it possible to define custom decision tree leaf types when using sci-kit learn?
I would like to train Random Forests using more complicated leaves, such as leaves containing linear regressors or gaussians. This would probably require defining a custom leaf type and implementing a new split criterion. Is that possible?
Thank you.
There are three different types of nodes: chance nodes, decision nodes, and end nodes. A chance node, represented by a circle, shows the probabilities of certain results. A decision node, represented by a square, shows a decision to be made, and an end node shows the final outcome of a decision path.
In order to make a decision tree, we need to calculate the impurity of each split, and when the purity is 100%, we make it as a leaf node. To check the impurity of feature 2 and feature 3 we will take the help for Entropy formula.
As decision trees are binary, they only have two decision nodes that split into left or right branches.
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
This is possible to do, but not very sensible.
Decision-tree in sklearn is written in Cython (a hybrid of C++ and Python) and uses an predetermined list of Cython split criteria. This makes sklearn trees very fast, but not easily customizable.
If you write your own leaves and splitter in pure Python, you will have to integrate them with sklearn Cython code. This is possible, but might be long and hard. And in the end, you will have slow code, because it will call Python from C++ at each node. Thus, it would be cheaper for you to write the tree-building algorithm from scratch.
If you are very serious in your endeavor, you can write the leaves and splitter in Cython, to make it easy integrable with scikit-learn and as fast. But this solution will not be so custom, because you will not be able to add another leaf model from Python.
If you want a quick try of such a model, you could use M5 algorithm in Weka.
And if you ask my opinion, I hardly see why you may want to add more complex models into each leaf of a Random Forest - it is already complex enough.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With