According to this post, SVC and LinearSVC in scikit learn are very different. But when reading the official scikit learn documentation, it is not that clear.
Especially for the loss functions, it seems that there is an equivalence:
And this post says that le loss functions are different:
1/2||w||^2 + C SUM xi_i
1/2||[w b]||^2 + C SUM xi_i
It seems that in the case of LinearSVC, the intercept is regularized, but the official documentation says otherwise.
Does anyone have more information? Thank you
The key principles of that difference are the following: By default scaling, LinearSVC minimizes the squared hinge loss while SVC minimizes the regular hinge loss. It is potential to manually outline a 'hinge' string for loss parameter in LinearSVC.
Between SVC and LinearSVC , one important decision criterion is that LinearSVC tends to be faster to converge the larger the number of samples is. This is due to the fact that the linear kernel is a special case, which is optimized for in Liblinear, but not in Libsvm.
The limitation of SVC is compensated by SVM non-linearly. And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM.
In scikit SVC and nuSVC are mathematically equivalent with both methods based on the library libsvm. The main difference is that SVC uses the parameter C while nuSVC uses the parameter nu . LinearSVC is based on the library liblinear.
SVC
is a wrapper of LIBSVM library, while LinearSVC
is a wrapper of LIBLINEAR
LinearSVC
is generally faster then SVC
and can work with much larger datasets, but it can only use linear kernel, hence its name. So the difference lies not in the formulation but in the implementation approach.
Quoting LIBLINEAR
FAQ:
When to use LIBLINEAR but not LIBSVM
There are some large data for which with/without nonlinear mappings gives similar performances.
Without using kernels, one can quickly train a much larger set via a linear classifier.
Document classification is one such application.
In the following example (20,242 instances and 47,236 features; available on LIBSVM data sets),
the cross-validation time is significantly reduced by using LIBLINEAR:
% time libsvm-2.85/svm-train -c 4 -t 0 -e 0.1 -m 800 -v 5 rcv1_train.binary
Cross Validation Accuracy = 96.8136%
345.569s
% time liblinear-1.21/train -c 4 -e 0.1 -v 5 rcv1_train.binary
Cross Validation Accuracy = 97.0161%
2.944s
Warning:While LIBLINEAR's default solver is very fast for document classification, it may be slow in other situations. See Appendix C of our SVM guide about using other solvers in LIBLINEAR.
Warning:If you are a beginner and your data sets are not large, you should consider LIBSVM first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With