From my research, I found three conflicting results:
SVC(kernel="linear")
is betterLinearSVC
is betterCan someone explain when to use LinearSVC
vs. SVC(kernel="linear")
?
It seems like LinearSVC is marginally better than SVC and is usually more finicky. But if scikit
decided to spend time on implementing a specific case for linear classification, why wouldn't LinearSVC
outperform SVC
?
The main difference between them is linearsvc lets your choose only linear classifier whereas svc let yo choose from a variety of non-linear classifiers. however it is not recommended to use svc for non-linear problems as they are super slow.
The objective of a Linear SVC (Support Vector Classifier) is to fit to the data you provide, returning a "best fit" hyperplane that divides, or categorizes, your data. From there, after getting the hyperplane, you can then feed some features to your classifier to see what the "predicted" class is.
The limitation of SVC is compensated by SVM non-linearly. And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM.
The key principles of that difference are the following: By default scaling, LinearSVC minimizes the squared hinge loss while SVC minimizes the regular hinge loss. It is potential to manually outline a 'hinge' string for loss parameter in LinearSVC.
Mathematically, optimizing an SVM is a convex optimization problem, usually with a unique minimizer. This means that there is only one solution to this mathematical optimization problem.
The differences in results come from several aspects: SVC
and LinearSVC
are supposed to optimize the same problem, but in fact all liblinear
estimators penalize the intercept, whereas libsvm
ones don't (IIRC). This leads to a different mathematical optimization problem and thus different results. There may also be other subtle differences such as scaling and default loss function (edit: make sure you set loss='hinge'
in LinearSVC
). Next, in multiclass classification, liblinear
does one-vs-rest by default whereas libsvm
does one-vs-one.
SGDClassifier(loss='hinge')
is different from the other two in the sense that it uses stochastic gradient descent and not exact gradient descent and may not converge to the same solution. However the obtained solution may generalize better.
Between SVC
and LinearSVC
, one important decision criterion is that LinearSVC
tends to be faster to converge the larger the number of samples is. This is due to the fact that the linear kernel is a special case, which is optimized for in Liblinear, but not in Libsvm.
The actual problem is in the problem with scikit approach, where they call SVM something which is not SVM. LinearSVC is actually minimizing squared hinge loss, instead of just hinge loss, furthermore, it penalizes size of the bias (which is not SVM), for more details refer to other question: Under what parameters are SVC and LinearSVC in scikit-learn equivalent?
So which one to use? It is purely problem specific. As due to no free lunch theorem it is impossible to say "this loss function is best, period". Sometimes squared loss will work better, sometimes normal hinge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With