Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to compute an RBF kernel in python?

Tags:

python

numpy

I would like to compute an RBF or "Gaussian" kernel for a data matrix X with n rows and d columns. The resulting square kernel matrix is given by:

K[i,j] = var * exp(-gamma * ||X[i] - X[j]||^2)

var and gamma are scalars.

What is the fastest way to do this in python?

like image 515
Callidior Avatar asked Nov 13 '17 19:11

Callidior


People also ask

What is RBF kernel in Python?

RBF is the default kernel used within the sklearn's SVM classification algorithm and can be described with the following formula: where gamma can be set manually and has to be >0.

Why RBF kernel is the best?

RBF Kernel is popular because of its similarity to K-Nearest Neighborhood Algorithm. It has the advantages of K-NN and overcomes the space complexity problem as RBF Kernel Support Vector Machines just needs to store the support vectors during training and not the entire dataset.

Is RBF kernel same as Gaussian?

The only difference between the two models is the K in the regularisation term. The key theoretical advantage of the kernel approach is that it allows you to interpret a non-linear model as a linear model following a fixed non-linear transformation that doesn't depend on the sample of data.

What is RBF kernel for SVM?

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification.


1 Answers

I am going to present four different methods for computing such a kernel, followed by a comparison of their run-time.

Using pure numpy

Here, I use the fact that ||x-y||^2 = ||x||^2 + ||y||^2 - 2 * x^T * y.

import numpy as np

X_norm = np.sum(X ** 2, axis = -1)
K = var * np.exp(-gamma * (X_norm[:,None] + X_norm[None,:] - 2 * np.dot(X, X.T)))

Using numexpr

numexpr is a python package that allows for efficient and parallelized array operations on numpy arrays. We can use it as follows to perform the same computation as above:

import numpy as np
import numexpr as ne

X_norm = np.sum(X ** 2, axis = -1)
K = ne.evaluate('v * exp(-g * (A + B - 2 * C))', {
        'A' : X_norm[:,None],
        'B' : X_norm[None,:],
        'C' : np.dot(X, X.T),
        'g' : gamma,
        'v' : var
})

Using scipy.spatial.distance.pdist

We could also use scipy.spatial.distance.pdist to compute a non-redundant array of pairwise squared euclidean distances, compute the kernel on that array and then transform it to a square matrix:

import numpy as np
from scipy.spatial.distance import pdist, squareform

K = squareform(var * np.exp(-gamma * pdist(X, 'sqeuclidean')))
K[np.arange(K.shape[0]), np.arange(K.shape[1])] = var

Using sklearn.metrics.pairwise.rbf_kernel

sklearn provides a built-in method for direct computation of an RBF kernel:

import numpy as np
from sklearn.metrics.pairwise import rbf_kernel

K = var * rbf_kernel(X, gamma = gamma)

Run-time comparison

I use 25,000 random samples of 512 dimensions for testing and perform experiments on an Intel Core i7-7700HQ (4 cores @ 2.8 GHz). More precisely:

X = np.random.randn(25000, 512)
gamma = 0.01
var = 5.0

Each method is run 7 times and the mean and standard deviation of the time per execution is reported.

|               Method                |       Time        |
|-------------------------------------|-------------------|
| numpy                               | 24.2 s ± 1.06 s   |
| numexpr                             | 8.89 s ± 314 ms   |
| scipy.spatial.distance.pdist        | 2min 59s ± 312 ms |
| sklearn.metrics.pairwise.rbf_kernel | 13.9 s ± 757 ms   |

First of all, scipy.spatial.distance.pdist is surprisingly slow.

numexpr is almost 3 times faster than the pure numpy method, but this speed-up factor will vary with the number of available CPUs.

sklearn.metrics.pairwise.rbf_kernel is not the fastest way, but only a bit slower than numexpr.

like image 62
Callidior Avatar answered Oct 03 '22 11:10

Callidior