Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between xgboost, extratreeclassifier, and randomforrestclasiffier?

I am new to all these methods and am trying to get a simple answer to that or perhaps if someone could direct me to a high level explanation somewhere on the web. My googling only returned kaggle sample codes.

Are the extratree and randomforrest essentially the same? And xgboost uses boosting when it chooses the features for any particular tree i.e. sampling the features. But then how do the other two algorithms select the features?

Thanks!

like image 314
vvv Avatar asked Feb 06 '16 03:02

vvv


1 Answers

Extra-trees(ET) aka. extremely randomized trees is quite similar to random forest (RF). Both methods are bagging methods aggregating some fully grow decision trees. RF will only try to split by e.g. a third of features, but evaluate any possible break point within these features and pick the best. However, ET will only evaluate a random few break points and pick the best of these. ET can bootstrap samples to each tree or use all samples. RF must use bootstrap to work well.

xgboost is an implementation of gradient boosting and can work with decision trees, typical smaller trees. Each tree is trained to correct the residuals of previous trained trees. Gradient boosting can be more difficult to train, but can achieve a lower model bias than RF. For noisy data bagging is likely to be most promising. For low noise and complex data structures boosting is likely to be most promising.

like image 166
Soren Havelund Welling Avatar answered Nov 18 '22 16:11

Soren Havelund Welling