I am trying to make a sklearn.svm.SVC(kernel="linear")
algorithm work. My X is an array made with [misc.imread(each).flatten() for each in filenames]
and my y2 is a part of a list made of strings such as ["A","1","4","F"..]
.
When I try to clf.fit(X,y2)
, sklearn tries to convert my string list into floats and fails, throwing ValueError: could not convert string to float
. How can I solve this?
EDIT: Upgrading sklearn to 0.15 solved the problem.
There is a helper class in scikit-learn which implements this nicely, it's called sklearn.preprocessing.LabelEncoder
:
from sklearn.preprocessing import LabelEncoder
y2 = ["A","1","4","F","A","1","4","F"]
lb = LabelEncoder()
y = lb.fit_transform(y2)
# y is now: array([2, 0, 1, 3, 2, 0, 1, 3])
In order to get back to your original labels (e.g. after classifying unseen data using SVC
), use the inverse_transform
of LabelEncoder
to restore the string labels:
lb.inverse_transform(y)
# => array(['A', '1', '4', 'F', 'A', '1', '4', 'F'], dtype='|S1')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With