Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DecisionTreeClassifier vs ExtraTreeClassifier

I'm trying to figure out which decision tree method from scikit-learn package will better suit my needs for performing classification task.

However, I found that there are two decision tree models available there:

  • standard DecisionTreeClassifier based on optimized CART algorithm from scikit.tree package.
  • ensemble method ExtraTreeClassifier from scikit.ensemble package.

Can anyone specify the advantages and disadvatages of using each of these models?

like image 552
dragoon Avatar asked Nov 24 '13 18:11

dragoon


People also ask

Why is Decisiontreeclassifier used?

The main advantage of the decision tree classifier is its ability to using different feature subsets and decision rules at different stages of classification. As shown in Figure 4.6, a general decision tree consists of one root node, a number of internal and leaf nodes, and branches.

Is extra trees better than random forest?

In terms of computational cost, and therefore execution time, the Extra Trees algorithm is faster. This algorithm saves time because the whole procedure is the same, but it randomly chooses the split point and does not calculate the optimal one.

Is extra tree bagging or boosting?

Extra Trees ensemble is an ensemble of decision trees and is related to bagging and random forest.

What is the use of Extratreesregressor?

An extra-trees regressor. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.


1 Answers

ExtraTreeClassifier is an extremely randomized version of DecisionTreeClassifier meant to be used internally as part of the ExtraTreesClassifier ensemble.

Averaging ensembles such as a RandomForestClassifier and ExtraTreesClassifier are meant to tackle the variance problems (lack of robustness with respect to small changes in the training set) of individual DecisionTreeClassifier instances.

If your main goal is maximizing prediction accuracy you should almost always use an ensemble of decision trees such as ExtraTreesClassifier (or alternatively a boosting ensemble) instead of training individual decision trees.

Have a look at the original Extra Trees paper for more details.

like image 170
ogrisel Avatar answered Sep 20 '22 14:09

ogrisel