What does "splitter" attribute in sklearn's DecisionTreeClassifier do?

Tags:

The sklearn DecisionTreeClassifier has a attribute called "splitter" , it is set to "best" by default, what does setting it to "best" or "random" do? I couldn't find enough information from the official documentation.

791

asked Oct 15 '17 15:10

Vijayabhaskar J

2 Answers

Short ans:

RandomSplitter initiates a **random split on each chosen feature**, whereas BestSplitter goes through **all possible splits on each chosen feature**.

Longer explanation:

This is clear when you go thru _splitter.pyx.

RandomSplitter calculates improvement only on threshold that is randomly initiated (ref. lines 761 and 801). BestSplitter goes through all possible splits in a while loop (ref. lines 436 (which is where loop starts) and 462). [Note: Lines are in relation to version 0.21.2.]

As opposed to earlier responses from 15 Oct 2017 and 1 Feb 2018, RandomSplitter and BestSplitter both loop through all relevant features. This is also evident in _splitter.pyx.

138

answered Sep 28 '22 12:09

JSong

In fact, the "random" parameter is used for implementing the extra randomized tree in sklearn. In a nutshell, this parameter means that the splitting algorithm will traverse all features but only randomly choose the splitting point between the maximum feature value and the minimum feature value. If you are interested in the algorithm's details, you can refer to this paper [1]. Moreover, if you are interested in the detailed implementation of this algorithm, you can refer to this page.

[1]. P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

answered Sep 28 '22 12:09

zhenlingcn

Related questions
                            
                                pandas dataframe : add & remove prefix/suffix from all cell values of entire dataframe
                            
                                APScheduler missing jobs after adding misfire_grace_time
                            
                                How to convert a matrix into column array with PANDAS / Python
                            
                                How to calculate perplexity of RNN in tensorflow
                            
                                Calling a parent method from outside the child
                            
                                Adding markers or lines to colorbar in matplotlib
                            
                                How to close web browser using python
                            
                                How do I add cv2 as a requirement in a python package?
                            
                                Regex add character to matched string
                            
                                Why does "pip install" not include my package_data files?
                            
                                ImportError: Missing required dependencies ['numpy']
                            
                                Django Middleware Error - Middleware changed for 1.7
                            
                                Running Scrapy from a script with file output
                            
                                How to parse ld+json using python
                            
                                matplotlib: hide subplot and fill space with other subplots
                            
                                coreapi only lists list and read method, even when user is logged
                            
                                assign in pandas pipeline
                            
                                How is pandas groupby method actually working?
                            
                                How to count objects in Tensorflow Object Detection API
                            
                                Best practices for writing argparse parsers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does "splitter" attribute in sklearn's DecisionTreeClassifier do?

Tags:

python

python-3.x

machine-learning

scikit-learn

Vijayabhaskar J

People also ask

2 Answers

JSong

zhenlingcn

Recent Activity

Donate For Us