Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SVC (support vector classification) with categorical (string) data as labels

Tags:

I use scikit-learn to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).

I try to fit the model:

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(features_training,labels_training)

But at the second line, I get an error: ValueError: could not convert string to float: 'A'

The error is expected because label_training contains string values which represent three different categories, such as A, B, C.

So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance, A = 0, B = 1, etc. But is this really the best solution?

like image 444
beta Avatar asked Jul 26 '16 08:07

beta


1 Answers

Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.

In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.

like image 173
FuriousGeorge Avatar answered Sep 28 '22 02:09

FuriousGeorge