I use scikit-learn
to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).
I try to fit the model:
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(features_training,labels_training)
But at the second line, I get an error: ValueError: could not convert string to float: 'A'
The error is expected because label_training
contains string values which represent three different categories, such as A
, B
, C
.
So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance, A = 0
, B = 1
, etc. But is this really the best solution?
Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.
In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With