Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In sklearn what is the difference between a SVM model with linear kernel and a SGD classifier with loss=hinge

I see that in scikit-learn I can build an SVM classifier with linear kernel in at last 3 different ways:

  • LinearSVC
  • SVC with kernel='linear' parameter
  • Stochastic Gradient Descent with loss='hinge' parameter

Now, I see that the difference between the first two classifiers is that the former is implemented in terms of liblinear and the latter in terms of libsvm.

How the first two classifiers differ from the third one?

like image 752
JackNova Avatar asked Apr 17 '15 16:04

JackNova


People also ask

Is SGD and SVM same?

SGD Classifier is a linear classifier (SVM, logistic regression, a.o.) optimized by the SGD. These are two different concepts. While SGD is a optimization method, Logistic Regression or linear Support Vector Machine is a machine learning algorithm/model.

Does SVM use SGD?

There is no SGD SVM. See this post. Stochastic gradient descent (SGD) is an algorithm to train the model. According to the documentation, SGD algorithm can be used to train many models.

What is the difference between linear SVM and SVC?

The limitation of SVC is compensated by SVM non-linearly. And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM.

What is the difference between SVC and LinearSVC?

The main difference between them is linearsvc lets your choose only linear classifier whereas svc let yo choose from a variety of non-linear classifiers. however it is not recommended to use svc for non-linear problems as they are super slow.


1 Answers

The first two always use the full data and solve a convex optimization problem with respect to these data points.

The latter can treat the data in batches and performs a gradient descent aiming to minimize expected loss with respect to the sample distribution, assuming that the examples are iid samples of that distribution.

The latter is typically used when the number of samples is very big or not ending. Observe that you can call the partial_fit function and feed it chunks of data.

Hope this helps?

like image 150
eickenberg Avatar answered Oct 03 '22 09:10

eickenberg