Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining the most contributing features for SVM classifier in sklearn

I have a dataset and I want to train my model on that data. After training, I need to know the features that are major contributors in the classification for a SVM classifier.

There is something called feature importance for forest algorithms, is there anything similar?

like image 284
Jibin Mathew Avatar asked Jan 11 '17 13:01

Jibin Mathew


2 Answers

Yes, there is attribute coef_ for SVM classifier but it only works for SVM with linear kernel. For other kernels it is not possible because data are transformed by kernel method to another space, which is not related to input space, check the explanation.

from matplotlib import pyplot as plt from sklearn import svm  def f_importances(coef, names):     imp = coef     imp,names = zip(*sorted(zip(imp,names)))     plt.barh(range(len(names)), imp, align='center')     plt.yticks(range(len(names)), names)     plt.show()  features_names = ['input1', 'input2'] svm = svm.SVC(kernel='linear') svm.fit(X, Y) f_importances(svm.coef_, features_names) 

And the output of the function looks like this: Feature importances

like image 61
Jakub Macina Avatar answered Oct 05 '22 23:10

Jakub Macina


In only one line of code:

fit an SVM model:

from sklearn import svm svm = svm.SVC(gamma=0.001, C=100., kernel = 'linear') 

and implement the plot as follows:

pd.Series(abs(svm.coef_[0]), index=features.columns).nlargest(10).plot(kind='barh') 

The resuit will be:

the most contributing features of the SVM model in absolute values

like image 35
Dor Avatar answered Oct 05 '22 22:10

Dor