Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of SVM Kernels? Linear vs RBF vs Poly

I'm using scikitlearn in Python to create some SVM models while trying different kernels. The code is pretty simple, and follows the form of:

from sklearn import svm
clf = svm.SVC(kernel='rbf', C=1, gamma=0.1) 
clf = svm.SVC(kernel='linear', C=1, gamma=0.1) 
clf = svm.SVC(kernel='poly', C=1, gamma=0.1) 
t0 = time()
clf.fit(X_train, y_train)
print "Training time:", round(time() - t0, 3), "s"
pred = clf.predict(X_test)

The data is 8 features and a little over 3000 observations. I was surprised to see that rbf was fitted in under a second, whereas linear took 90 seconds and poly took hours.

I assumed that the non-linear kernels would be more complicated and take more time. Is there a reason the linear is taking so much longer than rbf, and that poly is taking so much longer than both? Can it vary dramatically based on my data?

like image 681
Nicholas Hassan Avatar asked Apr 20 '17 20:04

Nicholas Hassan


1 Answers

Did you scale your data?

This can become an issue with SVM's. According to A Practical Guide to Support Vector Classification

Because kernel values usually depend on the inner products of feature vectors, e.g. the linear kernel and the polynomial kernel, large attribute values might cause numerical problems.

Now for an example, I will use the sklearn breast cancer dataset:

from time import time

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

clf_lin = SVC(kernel='linear', C=1.0, gamma=0.1)
clf_rbf = SVC(kernerl='rbf', C=1.0, gamma=0.1)

start = time()
clf_lin.fit(X_train, y_train)
print("Linear Kernel Non-Normalized Fit Time: {0.4f} s".format(time() - start))
start = time()
clf_rbf.fit(X_train, y_train)
print("RBF Kernel Non-Normalized Fit Time: {0.4f} s".format(time() - start))

scaler = MinMaxScaler()  # Default behavior is to scale to [0,1]
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y)

start = time()
clf_lin.fit(X_train, y_train)
print("Linear Kernel Normalized Fit Time: {0.4f} s".format(time() - start))
start = time()
clf_rbf.fit(X_train, y_train)
print("RBF Kernel Normalized Fit Time: {0.4f} s".format(time() - start))

Outputs:

Linear Kernel Non-Normalized Fit Time: 0.8672
RBF Kernel Non-Normalized Fit Time: 0.0124
Linear Kernel Normalized Fit Time: 0.0021
RBF Kernel Normalized Fit Time: 0.0039

So you can see that in this dataset with shape (560, 30) we get a pretty drastic improvement in performance from a little scaling.

This behavior is dependent upon the features with large values. Think about working in infinitely dimensional space. As the values that you populate that infinitely dimensional space with get larger the space between their multidimensional products gets a lot bigger. I cannot stress that a lot enough. Read about The Curse of Dimensionality, and do read more than just the wiki entry I linked. This spacing is what makes the process take longer. The mathematics behind trying to separate the classes in this massive space just get drastically more complex, especially as the number of features and observations grow. Thus it is critical to always scale your data. Even if you are just doing a simple linear regression it is a good practice as you will remove any possible bias towards features with larger values.

like image 62
Grr Avatar answered Nov 15 '22 18:11

Grr