I'm facing the following problem, I'm running a SVR from the scikit-learn library on a training set with about 46500 obsevations and it runs more than six hours, until now. I'm using the linear kernel. <pre class="prettyprint"><code>def build_linear(self): model = SVR(kernel='linear', C=1) return model </code></pre> I already tried changing the "C" value between 1e-3 and 1000 nothing changes. The poly kernel runs in about 5 minutes, but I need the values for an evaluation and can skip this part... Does anyone got an idea how to speed this up? Thanks a lot!

SVMs are known to scale badly with the number of samples! Instead of SVR with a linear-kernel, use LinearSVR or for huge data: SGDClassifier LinearSVR is more restricted in terms of what it can compute (no non-linear kernels) and more restricted algorithms usually have more assumptions and use these to speed-up things (or save memory). SVR is based on libsvm, while LinearSVR is based on liblinear. Both are well-tested high-quality implementations. (It might be valuable to add: don't waste time in general cases like these waiting 6 hours. Sub-sample your data and try smaller, less small, ... examples and deduce runtime or problems from that. edit: it seems you did that already, good!).

SciKit Learn SVR runs very long

Tags:

python

machine-learning

svm

scikit-learn

I'm facing the following problem, I'm running a SVR from the scikit-learn library on a training set with about 46500 obsevations and it runs more than six hours, until now.

I'm using the linear kernel.

def build_linear(self):
    model = SVR(kernel='linear', C=1)
    return model

I already tried changing the "C" value between 1e-3 and 1000 nothing changes.

The poly kernel runs in about 5 minutes, but I need the values for an evaluation and can skip this part...

Does anyone got an idea how to speed this up?

Thanks a lot!

357

asked Nov 23 '17 16:11

Tobias Schäfer

1 Answers

SVMs are known to scale badly with the number of samples!

Instead of SVR with a linear-kernel, use LinearSVR or for huge data: SGDClassifier

LinearSVR is more restricted in terms of what it can compute (no non-linear kernels) and more restricted algorithms usually have more assumptions and use these to speed-up things (or save memory).

SVR is based on libsvm, while LinearSVR is based on liblinear. Both are well-tested high-quality implementations.

(It might be valuable to add: don't waste time in general cases like these waiting 6 hours. Sub-sample your data and try smaller, less small, ... examples and deduce runtime or problems from that. edit: it seems you did that already, good!).

168

answered Oct 05 '22 23:10

sascha

Related questions
                            
                                Increasing bar width in bar chart using Altair
                            
                                How do you properly integrate unit tests for file parsing with pytest?
                            
                                Merge the first row with the column headers in a dataframe
                            
                                Pandas - Get unique values from column along with lists of row indices where they appear
                            
                                Trying to understand scipy and numpy interpolation
                            
                                How to parse the output received by gRPC stub client from tensorflow serving server?
                            
                                Count number of special characters [^&$#] appearing in a paragraph
                            
                                python rstrip or remove end of string by a pattern of characters
                            
                                Insert a node into an abstract syntax tree
                            
                                Converting raw file content from Flask file upload into dataframe using pandas
                            
                                Pandas error in Python: columns must be same length as key
                            
                                How To Push a Spark Dataframe to Elastic Search (Pyspark)
                            
                                git reset --hard HEAD vs git checkout <file>
                            
                                Thicken a one pixel line
                            
                                Efficient way to sample a large array many times with NumPy?
                            
                                Finding max /min value of individual columns
                            
                                How to convert a scipy csr_matrix back into lists of row, col and data?
                            
                                Where is the code for gradient descent?
                            
                                Rounding To Nearest Bin
                            
                                Show plots in new window instead of inline (not answered by previous posts)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With