Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing a Kernel for a support vector machine (XOR)

Tags:

The meat of my question is "how does one design a kernel function for a learning problem?"

As a quick background, I'm reading books on support vector machines and kernel machines, and everywhere I look authors give examples of kernels (polynomial kernels both homogeneous and nonhomogeneous, gaussian kernels, and allusions to text-based kernels to name a few), but all either provide pictures of the results without specifying the kernel, or vaguely claim that "an efficient kernel can be constructed". I'm interested in the process that goes on when one designs a kernel for a new problem.

Probably the easiest example is learning XOR, a smallest (4 points) non-linear data set as embedded the real plane. How would one come up with a natural (and non-trivial) kernel to linearly separate this data?

As a more complex example (see Cristianini, Introduction to SVMs, figure 6.2), how would one design a kernel to learn a checkerboard pattern? Cristianini states the picture was derived "using Gaussian kernels" but it seems that he uses multiple, and they are combined and modified in an unspecified way.

If this question is too broad to answer here, I'd appreciate a reference to the construction of one such kernel function, though I'd prefer the example be somewhat simple.

like image 292
JeremyKun Avatar asked May 14 '11 00:05

JeremyKun


People also ask

Can we solve XOR problem using SVM?

The answer is yes, if we have the extra columns, we can reduce the svm classifier to the polynomial kernal of degree 1 or the linear svm classifier.

What is the best kernel for SVM?

Gaussian Radial Basis Function (RBF) It is one of the most preferred and used kernel functions in svm.

What is kernel in support vector machines?

“Kernel” is used due to a set of mathematical functions used in Support Vector Machine providing the window to manipulate the data. So, Kernel Function generally transforms the training set of data so that a non-linear decision surface is able to transform to a linear equation in a higher number of dimension spaces.

Why kernel trick is used in SVM?

In essence, what the kernel trick does for us is to offer a more efficient and less expensive way to transform data into higher dimensions. With that saying, the application of the kernel trick is not limited to the SVM algorithm. Any computations involving the dot products (x, y) can utilize the kernel trick.


1 Answers

Q: "How does one design a kernel function for a learning problem?"

A: "Very carefully"

Trying the usual suspects (linear, polynomial, RBF) and using whichever works the best really is sound advice for someone trying to get the most accurate predictive model they can. For what it's worth it's a common criticism of SVMs that they seem to have a lot of parameters that you need to tune empirically. So at least you're not alone.

If you really want to design a kernel for a specific problem then you are right, it is a machine learning problem all in itself. It's called the 'model selection problem'. I'm not exactly an expert myself here, but the best source of insight into kernel methods for me was the book 'Gaussian Processes' by Rasumussen and Williams (it's freely available online), particularly chapters 4 and 5. I'm sorry that I can't say much more than 'read this huge book full of maths' but it's a complicated problem and they do a really good job of explaining it.

like image 135
Stompchicken Avatar answered Oct 28 '22 00:10

Stompchicken