I am using scikit-learn SVC to classify some data. I would like to increase the training performance.
clf = svm.SVC(cache_size=4000, probability=True, verbose=True)
Since sckikit-learn interfaces with libsvm and libsvm uses OpenMp I was hoping that:
export OMP_NUM_THREADS=16
would run on multiple cores. Unfortunately this did not help.
Any Ideas?
Thanks
There is no OpenMP support in the current binding for libsvm in scikit-learn. However it is very likely that if you have performance issues with sklearn.svm.SVC
should you use a more scalable model instead.
If your data is high dimensional it might be linearly separable. In that case it is advised to first try simpler models such as naive bayes models or sklearn.linear_model.Perceptron
that are known to be very speedy to train. You can also try sklearn.linear_model.LogisticRegression
and sklearn.svm.LinearSVC
both implemented using liblinear
that is more scalable than libsvm
albeit less memory efficients than other linear models in scikit-learn.
If your data is not linearly separable, you can try sklearn.ensemble.ExtraTreesClassifier
(adjust the n_estimators
parameter to trade-off training speed vs. predictive accuracy).
Alternatively you can try to approximate a RBF kernel using the RBFSampler
transformer of scikit-learn + fitting a linear model on the output:
http://scikit-learn.org/dev/modules/kernel_approximation.html
If you are using cross validation or grid search in scikit-learn then you can use multiple CPUs with the n_jobs parameter:
GridSearchCV(..., n_jobs=-1)
cross_val_score(..., n_jobs=-1)
Note that cross_val_score only needs a job per forld so if your number of folds is less than your CPUs you still won't be using all of your processing power.
LibSVM can use OpenMP if you can compile it and use it directly as per these instructions in the LibSVM FAQ. So you could export your scaled data in LibSVM format (here's a StackOverflow question on how to do that) and use LibSVM directly to train your data. But that will only be of benefit if you're grid searching or wanting to know accuracy scores, as far as I know the model LibSVM creates cannot be used in scikit-learn.
There is also a GPU accelerated version of LibSVM which I have tried and is extremely fast, but is not based on the current LibSVM version. I have talked to the developers and they say they hope to release a new version soon.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With