Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicting how long an scikit-learn classification will take to run

Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right?

Some classifiers/parameter combinations are quite fast, and some take so long that I eventually just kill the process. I'd like a way to estimate in advance how long it will take.

Alternatively, I'd accept some pointers on how to set common parameters to reduce the run time.

like image 259
ntaggart Avatar asked Mar 16 '14 21:03

ntaggart


People also ask

What is predict () sklearn?

The Sklearn 'Predict' Method Predicts an Output That being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

How long does it take to learn Scikit-learn?

You can learn in 2 months.

What is prediction latency?

Prediction latency is measured as the elapsed time necessary to make a prediction (e.g. in micro-seconds). Latency is often viewed as a distribution and operations engineers often focus on the latency at a given percentile of this distribution (e.g. the 90 percentile).


1 Answers

There are very specific classes of classifier or regressors that directly report remaining time or progress of your algorithm (number of iterations etc.). Most of this can be turned on by passing verbose=2 (any high number > 1) option to the constructor of individual models. Note: this behavior is according to sklearn-0.14. Earlier versions have a bit different verbose output (still useful though).

The best example of this is ensemble.RandomForestClassifier or ensemble.GradientBoostingClassifier` that print the number of trees built so far and remaining time.

clf = ensemble.GradientBoostingClassifier(verbose=3) clf.fit(X, y) Out:    Iter       Train Loss   Remaining Time      1           0.0769            0.10s      ... 

Or

clf = ensemble.RandomForestClassifier(verbose=3) clf.fit(X, y) Out:   building tree 1 of 100   ... 

This progress information is fairly useful to estimate the total time.

Then there are other models like SVMs that print the number of optimization iterations completed, but do not directly report the remaining time.

clf = svm.SVC(verbose=2) clf.fit(X, y) Out:    *     optimization finished, #iter = 1     obj = -1.802585, rho = 0.000000     nSV = 2, nBSV = 2     ... 

Models like linear models don't provide such diagnostic information as far as I know.

Check this thread to know more about what the verbosity levels mean: scikit-learn fit remaining time

like image 116
Sudeep Juvekar Avatar answered Sep 18 '22 15:09

Sudeep Juvekar