Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit learn svc coef0 parameter range

Documentation here.

I'm wondering how important the coef0 parameter is for SVCs under the polynomial and sigmoid kernels. As I understand it, it is the intercept term, just a constant as in linear regression to offset the function from zero. However to my knowledge, the SVM (scikit uses libsvm) should find this value.

What's a good general range to test over (is there one?). For example, generally with C, a safe choice is 10^-5 ... 10^5, going up in exponential steps.

But for coef0, the value seems highly data dependent and I'm not sure how to automate choosing good ranges for each grid search on each dataset. Any pointers?

like image 698
lollercoaster Avatar asked Jan 27 '14 20:01

lollercoaster


People also ask

What is Sklearn SVM import SVC?

It's a C-based support vector classification system based on libsvm. sklearn. svm. SVC is the module used by scikit-learn. This class is responsible for multi-class support using a one-to-one mechanism.

What is coef0 in SVC?

coef0 allows you to adjust the independent term in your kernel function, but you should also leave this alone most likely, and it is only used in the polynomial and sigmoid kernels. The probability parameter setting may prove useful to you.

What values can be used for the kernel parameter of SVC class?

gamma − {'scale', 'auto'} or float, It is the kernel coefficient for kernels 'rbf', 'poly' and 'sigmoid'. If you choose default i.e. gamma = 'scale' then the value of gamma to be used by SVC is 1/(𝑛_𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠∗𝑋.

What is the difference between SVM and SVC?

The limitation of SVC is compensated by SVM non-linearly. And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM.


1 Answers

First, sigmoid function is rarely the kernel. In fact, for almost none values of parameters it is known to induce the valid kernel (in the Mercer's sense).

Second, coef0 is not an intercept term, it is a parameter of the kernel projection, which can be used to overcome one of the important issues with the polynomial kernel. In general, just using coef0=0 should be just fine, but polynomial kernel has one issue, with p->inf, it more and more separates pairs of points, for which <x,y> is smaller than 1 and <a,b> with bigger value. it is because powers of values smaller than one gets closer and closer to 0, while the same power of value bigger than one grows to infinity. You can use coef0 to "scale" your data so there is no such distinction - you can add 1-min <x,y>, so no values are smaller than 1 . If you really feel the need for tuning this parameter, I would suggest search in the range of [min(1-min , 0),max(<x,y>)], where max is computed through all the training set.

like image 140
lejlot Avatar answered Oct 17 '22 10:10

lejlot