I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks!
Epsilon-Support Vector Regression. The free parameters in the model are C and epsilon. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to datasets with more than a couple of 10000 samples.
You can average the SVR sub-models predictions.
Alternatively you can try to fit a linear regression model on the output of kernel expansion computed with the Nystroem method.
Or you can try other non-linear regression models such as ensemble of randomized trees or gradient boosted regression trees.
Edit: I forgot to say: the kernel SVR model itself is not scalable as its complexity is more than quadratic hence there is no way to "speed it up".
Edit 2: Actually, often scaling the input variables to [0, 1]
or [-1, 1]
or to unit variance using StandardScaler
can speed up the convergence by quite a bit.
Also it is very unlikely that the default parameters will yield good results: you have to grid search the optimal value for gamma
and maybe also epsilon
on a sub samples of increasing sizes (to check the stability of the optimal parameters) before fitting to large models.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With