Scikit-Learn Custom Decision Tree Leaf Types

Tags:

scikit-learn

Is it possible to define custom decision tree leaf types when using sci-kit learn?

I would like to train Random Forests using more complicated leaves, such as leaves containing linear regressors or gaussians. This would probably require defining a custom leaf type and implementing a new split criterion. Is that possible?

Thank you.

572

asked Dec 10 '15 18:12

Mageek

1 Answers

This is possible to do, but not very sensible.

Decision-tree in sklearn is written in Cython (a hybrid of C++ and Python) and uses an predetermined list of Cython split criteria. This makes sklearn trees very fast, but not easily customizable.

If you write your own leaves and splitter in pure Python, you will have to integrate them with sklearn Cython code. This is possible, but might be long and hard. And in the end, you will have slow code, because it will call Python from C++ at each node. Thus, it would be cheaper for you to write the tree-building algorithm from scratch.

If you are very serious in your endeavor, you can write the leaves and splitter in Cython, to make it easy integrable with scikit-learn and as fast. But this solution will not be so custom, because you will not be able to add another leaf model from Python.

If you want a quick try of such a model, you could use M5 algorithm in Weka.

And if you ask my opinion, I hardly see why you may want to add more complex models into each leaf of a Random Forest - it is already complex enough.

188

answered Nov 19 '22 19:11

David Dale

Related questions
                            
                                Support Vector Regression multiple outputs
                            
                                How to implement a meta-estimator with the scikit-learn API?
                            
                                Kmeans using categorical variables
                            
                                sklearn.mixture.DPGMM: Unexpected results
                            
                                scikit-learn undersampling of unbalanced data for crossvalidation
                            
                                How to use a precomputed distance matrix in Scikit KMeans?
                            
                                How to Cross Validate Properly
                            
                                Feature space reduction for tag prediction
                            
                                How to get attribute list from fitted model in Scikit-learn?
                            
                                Does scikit-learn have Bayes Net ? If yes is there an implementation for reference
                            
                                Python : sklearn svm, providing a custom loss function
                            
                                how to change feature weight when training a model with sklearn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With