Is there a parallel implementation of GBM in R?

Question

I use the gbm library in R and I would like to use all my CPU to fit a model.

gbm.fit(x, y,
        offset = NULL,
        misc = NULL,...

desertnaut · Accepted Answer

Well, there cannot be a parallel implementation of GBM in principle, neither in R neither nor in any other implementation. And the reason is very simple: the boosting algorithm is by definition sequential.

Consider the following, quoted from The Elements of Statistical Learning, Ch. 10 (Boosting and Additive Trees), pp. 337-339 (emphasis mine):

A weak classifier is one whose error rate is only slightly better than random guessing. The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers Gm(x), m = 1, 2, . . . , M. The predictions from all of them are then combined through a weighted majority vote to produce the final prediction. [...] Each successive classifier is thereby forced to concentrate on those training observations that are missed by previous ones in the sequence.

In a picture (ibid, p. 338):

enter image description here

In fact, this is frequently noted as a key disadvantage of GBM relative to, say, Random Forest (RF), where the trees are independent and can thus be fitted in parrallel (see the bigrf R package).

Hence, the best you can do, as the commenters above have pinpointed, is to use your excess CPU cores to parallelize the cross-validation process...

Is there a parallel implementation of GBM in R?

Tags:

r

parallel-processing

machine-learning

gbm

Boris LIM

1 Answers

desertnaut

Recent Activity

Donate For Us

Is there a parallel implementation of GBM in R?

Tags:

r

parallel-processing

machine-learning

gbm

Boris LIM

1 Answers

desertnaut

Related questions

Recent Activity

Donate For Us