Gradient boosting predictions in low-latency production environments?

Question

Can anyone recommend a strategy for making predictions using a gradient boosting model in the <10-15ms range (the faster the better)?

I have been using R's gbm package, but the first prediction takes ~50ms (subsequent vectorized predictions average to 1ms, so there appears to be overhead, perhaps in the call to the C++ library). As a guideline, there will be ~10-50 inputs and ~50-500 trees. The task is classification and I need access to predicted probabilities.

I know there are a lot of libraries out there, but I've had little luck finding information even on rough prediction times for them. The training will happen offline, so only predictions need to be fast -- also, predictions may come from a piece of code / library that is completely separate from whatever does the training (as long as there is a common format for representing the trees).

2 revs · Accepted Answer

I'm the author of the scikit-learn gradient boosting module, a Gradient Boosted Regression Trees implementation in Python. I put some effort in optimizing prediction time since the method was targeted at low-latency environments (in particular ranking problems); the prediction routine is written in C, still there is some overhead due to Python function calls. Having said that: prediction time for single data points with ~50 features and about 250 trees should be << 1ms.

In my use-cases prediction time is often governed by the cost of feature extraction. I strongly recommend profiling to pin-point the source of the overhead (if you use Python, I can recommend line_profiler).

If the source of the overhead is prediction rather than feature extraction you might check whether its possible to do batch predictions instead of predicting single data points thus limiting the overhead due to the Python function call (e.g. in ranking you often need to score the top-K documents, so you can do the feature extraction first and then run predict on the K x n_features matrix.

If this doesn't help either you should try the limit the number of trees because the runtime cost for prediction is basically linear in the number of trees. There are a number of ways to limit the number of trees without affecting the model accuracy:

Proper tuning of the learning rate; the smaller the learning rate, the more trees are needed and thus the slower is prediction.
Post-process GBM with L1 regularization (Lasso); See Elements of Statistical Learning Section 16.3.1 - use predictions of each tree as new features and run the representation through a L1 regularized linear model - remove those trees that don't get any weight.
Fully-corrective weight updates; instead of doing the line-search/weight update just for the most recent tree, update all trees (see [Warmuth2006] and [Johnson2012]). Better convergence - fewer trees.

If none of the above does the trick you could investigate cascades or early-exit strategies (see [Chen2012])

References:

[Warmuth2006] M. Warmuth, J. Liao, and G. Ratsch. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd international conference on Machine learning, 2006.

[Johnson2012] Rie Johnson, Tong Zhang, Learning Nonlinear Functions Using Regularized Greedy Forest, arxiv, 2012.

[Chen2012] Minmin Chen, Zhixiang Xu, Kilian Weinberger, Olivier Chapelle, Dor Kedem, Classifier Cascade for Minimizing Feature Evaluation Cost, JMLR W&CP 22: 218-226, 2012.

Gradient boosting predictions in low-latency production environments?

Tags:

machine-learning

classification

lockedoff

1 Answers

2 revs

Recent Activity

Donate For Us

Gradient boosting predictions in low-latency production environments?

Tags:

machine-learning

classification

lockedoff

1 Answers

2 revs

Related questions

Recent Activity

Donate For Us