Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up sklearn SVR?

I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks!

like image 938
hshed Avatar asked Mar 23 '13 02:03

hshed


People also ask

What are the number of free parameters in support vector Regressors?

Epsilon-Support Vector Regression. The free parameters in the model are C and epsilon. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to datasets with more than a couple of 10000 samples.


1 Answers

You can average the SVR sub-models predictions.

Alternatively you can try to fit a linear regression model on the output of kernel expansion computed with the Nystroem method.

Or you can try other non-linear regression models such as ensemble of randomized trees or gradient boosted regression trees.

Edit: I forgot to say: the kernel SVR model itself is not scalable as its complexity is more than quadratic hence there is no way to "speed it up".

Edit 2: Actually, often scaling the input variables to [0, 1] or [-1, 1] or to unit variance using StandardScaler can speed up the convergence by quite a bit.

Also it is very unlikely that the default parameters will yield good results: you have to grid search the optimal value for gamma and maybe also epsilon on a sub samples of increasing sizes (to check the stability of the optimal parameters) before fitting to large models.

like image 188
ogrisel Avatar answered Sep 20 '22 20:09

ogrisel