Given a linearly separable dataset, is it necessarily better to use a a hard margin SVM over a soft-margin SVM?

I would expect soft-margin SVM to be better even when training dataset is linearly separable. The reason is that in a hard-margin SVM, a single outlier can determine the boundary, which makes the classifier overly sensitive to noise in the data. In the diagram below, a single red outlier essentially determines the boundary, which is the hallmark of overfitting <img src="https://i.stack.imgur.com/p8mA3.png" alt="enter image description here"> To get a sense of what soft-margin SVM is doing, it's better to look at it in the dual formulation, where you can see that it has the same margin-maximizing objective (margin could be negative) as the hard-margin SVM, but with an additional constraint that each lagrange multiplier associated with support vector is bounded by C. Essentially this bounds the influence of any single point on the decision boundary, for derivation, see Proposition 6.12 in Cristianini/Shaw-Taylor's "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods". The result is that soft-margin SVM could choose decision boundary that has non-zero training error even if dataset is linearly separable, and is less likely to overfit. Here's an example using libSVM on a synthetic problem. Circled points show support vectors. You can see that decreasing C causes classifier to sacrifice linear separability in order to gain stability, in a sense that influence of any single datapoint is now bounded by C. <img src="https://i.stack.imgur.com/0aYO8.png" alt="enter image description here"> Meaning of support vectors: For hard margin SVM, support vectors are the points which are "on the margin". In the picture above, C=1000 is pretty close to hard-margin SVM, and you can see the circled points are the ones that will touch the margin (margin is almost 0 in that picture, so it's essentially the same as the separating hyperplane) For soft-margin SVM, it's easer to explain them in terms of dual variables. Your support vector predictor in terms of dual variables is the following function. <img src="https://i.stack.imgur.com/wzgIb.png" alt="enter image description here"> Here, alphas and b are parameters that are found during training procedure, xi's, yi's are your training set and x is the new datapoint. Support vectors are datapoints from training set which are are included in the predictor, ie, the ones with non-zero alpha parameter.

SVM - hard or soft margins?

2 Answers

I would expect soft-margin SVM to be better even when training dataset is linearly separable. The reason is that in a hard-margin SVM, a single outlier can determine the boundary, which makes the classifier overly sensitive to noise in the data.

In the diagram below, a single red outlier essentially determines the boundary, which is the hallmark of overfitting

enter image description here

To get a sense of what soft-margin SVM is doing, it's better to look at it in the dual formulation, where you can see that it has the same margin-maximizing objective (margin could be negative) as the hard-margin SVM, but with an additional constraint that each lagrange multiplier associated with support vector is bounded by C. Essentially this bounds the influence of any single point on the decision boundary, for derivation, see Proposition 6.12 in Cristianini/Shaw-Taylor's "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods".

The result is that soft-margin SVM could choose decision boundary that has non-zero training error even if dataset is linearly separable, and is less likely to overfit.

Here's an example using libSVM on a synthetic problem. Circled points show support vectors. You can see that decreasing C causes classifier to sacrifice linear separability in order to gain stability, in a sense that influence of any single datapoint is now bounded by C.

enter image description here

Meaning of support vectors:

For hard margin SVM, support vectors are the points which are "on the margin". In the picture above, C=1000 is pretty close to hard-margin SVM, and you can see the circled points are the ones that will touch the margin (margin is almost 0 in that picture, so it's essentially the same as the separating hyperplane)

For soft-margin SVM, it's easer to explain them in terms of dual variables. Your support vector predictor in terms of dual variables is the following function.

enter image description here

Here, alphas and b are parameters that are found during training procedure, xi's, yi's are your training set and x is the new datapoint. Support vectors are datapoints from training set which are are included in the predictor, ie, the ones with non-zero alpha parameter.

195

answered Sep 21 '22 05:09

Yaroslav Bulatov

In my opinion, Hard Margin SVM overfits to a particular dataset and thus can not generalize. Even in a linearly separable dataset (as shown in the above diagram), outliers well within the boundaries can influence the margin. Soft Margin SVM has more versatility because we have control over choosing the support vectors by tweaking the C.

answered Sep 21 '22 05:09

codingJitters

Related questions
                            
                                How to get a random element from a C++ container?
                            
                                Generate all combinations from multiple lists
                            
                                Calculate median in c#
                            
                                Algorithm to calculate number of intersecting discs
                            
                                Find the row representing the smallest integer in row wise sorted matrix
                            
                                Training a Neural Network with Reinforcement learning
                            
                                The best way to calculate the height in a binary search tree? (balancing an AVL-tree)
                            
                                Create a hashcode of two numbers
                            
                                Perceptron learning algorithm not converging to 0
                            
                                Algorithm: How do I fade from Red to Green via Yellow using RGB values?
                            
                                Generate all unique substrings for given string
                            
                                How to cartoon-ify an image programmatically?
                            
                                Efficient swapping of elements of an array in Java
                            
                                Algorithm to determine how positive or negative a statement/text is
                            
                                What's a good algorithm to generate a maze? [closed]
                            
                                Is there any way to detect strings like putjbtghguhjjjanika?
                            
                                Hash table - why is it faster than arrays?
                            
                                Quickest way to find missing number in an array of numbers
                            
                                Fast String Hashing Algorithm with low collision rates with 32 bit integer [closed]
                            
                                Is a list (potentially) divisible by another?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SVM - hard or soft margins?

Tags:

algorithm

machine-learning

svm

D.G

People also ask

2 Answers

Yaroslav Bulatov

codingJitters

Recent Activity

Donate For Us