As I understand large margin effect in SVM:
For example let's look at this image:
In SVM optimization objective by regularization term we trying to find a set of parameters, where the norm of (parameters vector) theta is small. So we must find vector theta which is small and projections of positive examples (p) on this vector large (to compensate small Theta vector for inner product). In the same time large p gives us large margin. In this image we find ideal theta, and big p with it (and large margin):
Why logistic regression is not large margin classifier? In LR we minimize Theta vector in regularization term in the same way. Maybe I did not understand something, if so - correct me.
I've used images and theory from Coursera ml class.
The concept of large margins is a unifying principle for the analysis of many different approaches to the classification of data from examples, including boosting, mathematical programming, neural networks, and support vector machines.
Logistic regression is basically a supervised classification algorithm. In a classification problem, the target variable(or output), y, can take only discrete values for a given set of features(or inputs), X. Contrary to popular belief, logistic regression is a regression model.
Thus, we cannot call logistic regression as a maximum margin classifier. Clearly, the SVM decision function, shown in 'red', has not changed (it is passing through (-1, 0). However, the logistic regression, shown in 'green', has moved farther away from the '+' class.
A classifier with a large margin makes no low certainty classification decisions. This gives you a classification safety margin: a slight error in measurement or a slight document variation will not cause a misclassification. Another intuition motivating SVMs is shown in Figure 15.2 .
Logistic Regression is a large margin loss. Lecun mentions this in one or more of his papers on energy-based learning.
To see that LR does induce a margin, it is easier to look at the softmax loss (which is equivalent to LR).
There are two terms in the softmax loss: L(z)=z_{true} - log(\sum_i \exp(z_i))
which means that the distance of an example from its true decision boundary needs to beat the log sum of the distances from all of the decision boundaries.
Because the softmax function is a probability distribution, the largest the log softmax can be is 0, so the log softmax returns a negative value (i.e. a penalty) that approaches 0 as the probability of the true class under the softmax function approaches 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With