Help me understand linear separability in a binary SVM

Tags:

I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me.

My question pertains to linear separability with hyperplanes in a support vector machine.

According to Wikipedia:

...formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.classifier.

The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I'm implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a "flat" subset of dimension n − 1, or for two-dimensional space - a 1D line.

Here is what I mean:

circularly enclosed class separation for a 2D binary SVM

That's not a hyperplane. That's circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?

This example application can be downloaded here.

Edit:

Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?

630

asked Oct 22 '10 13:10

Petrus Theron

2 Answers

As mokus explained, support vector machines use a kernel function to implicitly map data into a feature space where they are linearly separable:

SVM mapping one feature space into another

Different kernel functions are used for various kinds of data. Note that an extra dimension (feature) is added by the transformation in the picture, although this feature is never materialized in memory.

(Illustration from Chris Thornton, U. Sussex.)

184

answered Jan 03 '23 14:01

Fred Foo

Check out this YouTube video that illustrates an example of linearly inseparable points that become separable by a plane when mapped to a higher dimension.

alt text

answered Jan 03 '23 14:01

Amro

Related questions
                            
                                How to get the target number with +3 or *5 operations without recursion?
                            
                                Linear algorithm of finding tree diameter
                            
                                Addition of every subset of two multiplied
                            
                                How to handle std::find_if() returning false?
                            
                                Extracting a given number of the highest values in a List
                            
                                Java: How to get the parts of a path
                            
                                Where do mathematical algorithms for Reddit's ranking, as an example, come from?
                            
                                Better way for concatenating two sorted list of integers
                            
                                Is it possible to go to higher level scope condition's else in C++?
                            
                                Algorithm to estimate word's complexity
                            
                                How can I optimize a multiple (matrix) switch / case algorithm?
                            
                                Primality check algorithm
                            
                                Priority Queue with O(1) Dequeue and O(whatever) Enqueue
                            
                                How does this recursion work?
                            
                                Sudoku solving algorithm C++
                            
                                karger min cut algorithm in python 2.7
                            
                                QuickGraph Dijkstra example
                            
                                Distribute 'items' in buckets equally (best effort)
                            
                                How do I round a decimal to a specific fraction in C#?
                            
                                How would you remove elements of a std::vector based on some property of the elements?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Help me understand linear separability in a binary SVM

Tags:

algorithm

machine-learning

classification

svm

libsvm