Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a parallel implementation of GBM in R?

I use the gbm library in R and I would like to use all my CPU to fit a model.

gbm.fit(x, y,
        offset = NULL,
        misc = NULL,...
like image 859
Boris LIM Avatar asked Nov 30 '25 03:11

Boris LIM


1 Answers

Well, there cannot be a parallel implementation of GBM in principle, neither in R neither nor in any other implementation. And the reason is very simple: the boosting algorithm is by definition sequential.

Consider the following, quoted from The Elements of Statistical Learning, Ch. 10 (Boosting and Additive Trees), pp. 337-339 (emphasis mine):

A weak classifier is one whose error rate is only slightly better than random guessing. The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers Gm(x), m = 1, 2, . . . , M. The predictions from all of them are then combined through a weighted majority vote to produce the final prediction. [...] Each successive classifier is thereby forced to concentrate on those training observations that are missed by previous ones in the sequence.

In a picture (ibid, p. 338):

enter image description here

In fact, this is frequently noted as a key disadvantage of GBM relative to, say, Random Forest (RF), where the trees are independent and can thus be fitted in parrallel (see the bigrf R package).

Hence, the best you can do, as the commenters above have pinpointed, is to use your excess CPU cores to parallelize the cross-validation process...

like image 127
desertnaut Avatar answered Dec 02 '25 18:12

desertnaut



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!