I am new to machine learning field and right now trying to get a grasp of how the most common learning algorithms work and understand when to apply each one of them. At the moment I am learning on how Support Vector Machines work and have a question on custom kernel functions.
There is plenty of information on the web on more standard (linear, RBF, polynomial) kernels for SVMs. I, however, would like to understand when it is reasonable to go for a custom kernel function. My questions are:
1) What are other possible kernels for SVMs?
2) In which situation one would apply custom kernels?
3) Can custom kernel substantially improve prediction quality of SVM?
Different SVM algorithms use differing kinds of kernel functions. These functions are of different kinds—for instance, linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. The most preferred kind of kernel function is RBF. Because it's localized and has a finite response along the complete x-axis.
“Kernel” is used due to a set of mathematical functions used in Support Vector Machine providing the window to manipulate the data. So, Kernel Function generally transforms the training set of data so that a non-linear decision surface is able to transform to a linear equation in a higher number of dimension spaces.
By passing Gram Matrix You can define your own kernels by either giving the kernel as a function, as we saw in the above example, or by precomputing the Gram matrix. We'll first make a function that makes gram matrix given data and function and then make function to compute RBF.
The ultimate benefit of the kernel trick is that the objective function we are optimizing to fit the higher dimensional decision boundary only includes the dot product of the transformed feature vectors. Therefore, we can just substitute these dot product terms with the kernel function, and we don't even use ϕ(x).
1) What are other possible kernels for SVMs?
There are infinitely many of these, see for example list of ones implemented in pykernels (which is far from being exhaustive)
https://github.com/gmum/pykernels
2) In which situation one would apply custom kernels?
Basically in two cases:
3) Can custom kernel substantially improve prediction quality of SVM?
Yes, in particular there always exists a (hypothethical) Bayesian optimal kernel, defined as:
K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y)
in other words, if one has a true probability P(l|x) of label l being assigned to a point x, then we can create a kernel, which pretty much maps your data points onto one-hot encodings of their most probable labels, thus leading to Bayes optimal classification (as it will obtain Bayes risk).
In practise it is of course impossible to get such kernel, as it means that you already solved your problem. However, it shows that there is a notion of "optimal kernel", and obviously none of the classical ones is not of this type (unless your data comes from veeeery simple distributions). Furthermore, each kernel is a kind of prior over decision functions - closer you get to the actual one with your induced family of functions - the more probable is to get a reasonable classifier with SVM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With