I'm facing the following problem, I'm running a SVR from the scikit-learn library on a training set with about 46500 obsevations and it runs more than six hours, until now.
I'm using the linear kernel.
def build_linear(self):
model = SVR(kernel='linear', C=1)
return model
I already tried changing the "C" value between 1e-3 and 1000 nothing changes.
The poly kernel runs in about 5 minutes, but I need the values for an evaluation and can skip this part...
Does anyone got an idea how to speed this up?
Thanks a lot!
The most likely explanation is that you're using too many training examples for your SVM implementation. SVMs are based around a kernel function. Most implementations explicitly store this as an NxN matrix of distances between the training points to avoid computing entries over and over again.
Note, we use the following hyperparameter values for the SVR model: epsilon = 10, C = 1. As explained before, epsilon defines the width of the tube around the hyperplane. Meanwhile, regularization parameter C allows us to assign the weight to “slack,” telling the algorithm how much we care about the error.
SVMs are known to scale badly with the number of samples!
Instead of SVR with a linear-kernel, use LinearSVR or for huge data: SGDClassifier
LinearSVR is more restricted in terms of what it can compute (no non-linear kernels) and more restricted algorithms usually have more assumptions and use these to speed-up things (or save memory).
SVR is based on libsvm, while LinearSVR is based on liblinear. Both are well-tested high-quality implementations.
(It might be valuable to add: don't waste time in general cases like these waiting 6 hours. Sub-sample your data and try smaller, less small, ... examples and deduce runtime or problems from that. edit: it seems you did that already, good!).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With