Custom kernels for SVM, when to apply them?

Tags:

I am new to machine learning field and right now trying to get a grasp of how the most common learning algorithms work and understand when to apply each one of them. At the moment I am learning on how Support Vector Machines work and have a question on custom kernel functions.
There is plenty of information on the web on more standard (linear, RBF, polynomial) kernels for SVMs. I, however, would like to understand when it is reasonable to go for a custom kernel function. My questions are:

1) What are other possible kernels for SVMs?
2) In which situation one would apply custom kernels?
3) Can custom kernel substantially improve prediction quality of SVM?

857

asked May 26 '16 18:05

kroonike

1 Answers

1) What are other possible kernels for SVMs?

There are infinitely many of these, see for example list of ones implemented in pykernels (which is far from being exhaustive)

https://github.com/gmum/pykernels

Linear
Polynomial
RBF
Cosine similarity
Exponential
Laplacian
Rational quadratic
Inverse multiquadratic
Cauchy
T-Student
ANOVA
Additive Chi^2
Chi^2
MinMax
Min/Histogram intersection
Generalized histogram intersection
Spline
Sorensen
Tanimoto
Wavelet
Fourier
Log (CPD)
Power (CPD)

2) In which situation one would apply custom kernels?

Basically in two cases:

"simple" ones give very bad results
data is specific in some sense and so - in order to apply traditional kernels one has to degenerate it. For example if your data is in a graph format, you cannot apply RBF kernel, as graph is not a constant-size vector, thus you need a graph kernel to work with this object without some kind of information-loosing projection. also sometimes you have an insight into data, you know about some underlying structure, which might help classifier. One such example is a periodicity, you know that there is a kind of recuring effect in your data - then it might be worth looking for a specific kernel etc.

3) Can custom kernel substantially improve prediction quality of SVM?

Yes, in particular there always exists a (hypothethical) Bayesian optimal kernel, defined as:

K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y)

in other words, if one has a true probability P(l|x) of label l being assigned to a point x, then we can create a kernel, which pretty much maps your data points onto one-hot encodings of their most probable labels, thus leading to Bayes optimal classification (as it will obtain Bayes risk).

In practise it is of course impossible to get such kernel, as it means that you already solved your problem. However, it shows that there is a notion of "optimal kernel", and obviously none of the classical ones is not of this type (unless your data comes from veeeery simple distributions). Furthermore, each kernel is a kind of prior over decision functions - closer you get to the actual one with your induced family of functions - the more probable is to get a reasonable classifier with SVM.

107

answered Sep 20 '22 22:09

lejlot

Related questions
                            
                                What are the uses of tf.space_to_depth?
                            
                                Why is my implementations of the log-loss (or cross-entropy) not producing the same results?
                            
                                Find length of cluster (how many point associated with cluster) after KMeans clustering (scikit learn)
                            
                                Oversampling or SMOTE in Pyspark
                            
                                Tensorflow error "has type list, but expected one of: int, long, float"
                            
                                What are some of the ways to convert NLP to SQL?
                            
                                Is it possible to add TransformedTargetRegressor into a scikit-learn pipeline?
                            
                                How to do GridSearchCV for F1-score in classification problem with scikit-learn?
                            
                                when restoring from a checkpoint, how can I change the data type of the parameters?
                            
                                Negative accuracy score in regression models with Scikit-Learn
                            
                                Batchnorm2d Pytorch - Why pass number of channels to batchnorm?
                            
                                How to choose the number of units for the Dense layer in the Convoluted neural network for a Image classification problem?
                            
                                Stochastic gradient Descent implementation - MATLAB
                            
                                Most mutually distant k elements (clustering?)
                            
                                How to detect if a event/action occurred from a text?
                            
                                scikit-learn, add features to a vectorized set of documents
                            
                                Neural network, is it worth changing learning rate and momentum over time
                            
                                Echo State Network learning Mackey-Glass function, but how?
                            
                                Will larger batch size make computation time less in machine learning?
                            
                                TypeError: 'numpy.float64' object is not iterable Keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Custom kernels for SVM, when to apply them?

Tags:

machine-learning

svm

scikit-learn

kroonike

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us