I'm trying to figure out which decision tree method from scikit-learn package will better suit my needs for performing classification task.
However, I found that there are two decision tree models available there:
Can anyone specify the advantages and disadvatages of using each of these models?
The main advantage of the decision tree classifier is its ability to using different feature subsets and decision rules at different stages of classification. As shown in Figure 4.6, a general decision tree consists of one root node, a number of internal and leaf nodes, and branches.
In terms of computational cost, and therefore execution time, the Extra Trees algorithm is faster. This algorithm saves time because the whole procedure is the same, but it randomly chooses the split point and does not calculate the optimal one.
Extra Trees ensemble is an ensemble of decision trees and is related to bagging and random forest.
An extra-trees regressor. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
ExtraTreeClassifier
is an extremely randomized version of DecisionTreeClassifier
meant to be used internally as part of the ExtraTreesClassifier
ensemble.
Averaging ensembles such as a RandomForestClassifier
and ExtraTreesClassifier
are meant to tackle the variance problems (lack of robustness with respect to small changes in the training set) of individual DecisionTreeClassifier
instances.
If your main goal is maximizing prediction accuracy you should almost always use an ensemble of decision trees such as ExtraTreesClassifier
(or alternatively a boosting ensemble) instead of training individual decision trees.
Have a look at the original Extra Trees paper for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With