Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right?
Some classifiers/parameter combinations are quite fast, and some take so long that I eventually just kill the process. I'd like a way to estimate in advance how long it will take.
Alternatively, I'd accept some pointers on how to set common parameters to reduce the run time.
The Sklearn 'Predict' Method Predicts an Output That being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).
You can learn in 2 months.
Prediction latency is measured as the elapsed time necessary to make a prediction (e.g. in micro-seconds). Latency is often viewed as a distribution and operations engineers often focus on the latency at a given percentile of this distribution (e.g. the 90 percentile).
There are very specific classes of classifier or regressors that directly report remaining time or progress of your algorithm (number of iterations etc.). Most of this can be turned on by passing verbose=2
(any high number > 1) option to the constructor of individual models. Note: this behavior is according to sklearn-0.14. Earlier versions have a bit different verbose output (still useful though).
The best example of this is ensemble.RandomForestClassifier
or ensemble.GradientBoostingClassifier` that print the number of trees built so far and remaining time.
clf = ensemble.GradientBoostingClassifier(verbose=3) clf.fit(X, y) Out: Iter Train Loss Remaining Time 1 0.0769 0.10s ...
Or
clf = ensemble.RandomForestClassifier(verbose=3) clf.fit(X, y) Out: building tree 1 of 100 ...
This progress information is fairly useful to estimate the total time.
Then there are other models like SVMs that print the number of optimization iterations completed, but do not directly report the remaining time.
clf = svm.SVC(verbose=2) clf.fit(X, y) Out: * optimization finished, #iter = 1 obj = -1.802585, rho = 0.000000 nSV = 2, nBSV = 2 ...
Models like linear models don't provide such diagnostic information as far as I know.
Check this thread to know more about what the verbosity levels mean: scikit-learn fit remaining time
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With