Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn OpenMP libsvm

I am using scikit-learn SVC to classify some data. I would like to increase the training performance.

clf = svm.SVC(cache_size=4000, probability=True, verbose=True)

Since sckikit-learn interfaces with libsvm and libsvm uses OpenMp I was hoping that:

export OMP_NUM_THREADS=16

would run on multiple cores. Unfortunately this did not help.

Any Ideas?

Thanks

like image 391
jassinm Avatar asked Dec 16 '22 17:12

jassinm


2 Answers

There is no OpenMP support in the current binding for libsvm in scikit-learn. However it is very likely that if you have performance issues with sklearn.svm.SVC should you use a more scalable model instead.

If your data is high dimensional it might be linearly separable. In that case it is advised to first try simpler models such as naive bayes models or sklearn.linear_model.Perceptron that are known to be very speedy to train. You can also try sklearn.linear_model.LogisticRegression and sklearn.svm.LinearSVC both implemented using liblinear that is more scalable than libsvm albeit less memory efficients than other linear models in scikit-learn.

If your data is not linearly separable, you can try sklearn.ensemble.ExtraTreesClassifier (adjust the n_estimators parameter to trade-off training speed vs. predictive accuracy).

Alternatively you can try to approximate a RBF kernel using the RBFSampler transformer of scikit-learn + fitting a linear model on the output:

http://scikit-learn.org/dev/modules/kernel_approximation.html

like image 85
ogrisel Avatar answered Dec 25 '22 01:12

ogrisel


If you are using cross validation or grid search in scikit-learn then you can use multiple CPUs with the n_jobs parameter:

GridSearchCV(..., n_jobs=-1)
cross_val_score(..., n_jobs=-1)

Note that cross_val_score only needs a job per forld so if your number of folds is less than your CPUs you still won't be using all of your processing power.

LibSVM can use OpenMP if you can compile it and use it directly as per these instructions in the LibSVM FAQ. So you could export your scaled data in LibSVM format (here's a StackOverflow question on how to do that) and use LibSVM directly to train your data. But that will only be of benefit if you're grid searching or wanting to know accuracy scores, as far as I know the model LibSVM creates cannot be used in scikit-learn.

There is also a GPU accelerated version of LibSVM which I have tried and is extremely fast, but is not based on the current LibSVM version. I have talked to the developers and they say they hope to release a new version soon.

like image 23
Damon Maria Avatar answered Dec 25 '22 01:12

Damon Maria