Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn trying to convert string list to floats

I am trying to make a sklearn.svm.SVC(kernel="linear") algorithm work. My X is an array made with [misc.imread(each).flatten() for each in filenames] and my y2 is a part of a list made of strings such as ["A","1","4","F"..].

When I try to clf.fit(X,y2), sklearn tries to convert my string list into floats and fails, throwing ValueError: could not convert string to float. How can I solve this?

EDIT: Upgrading sklearn to 0.15 solved the problem.

like image 899
sikerbela Avatar asked Jan 19 '15 01:01

sikerbela


1 Answers

There is a helper class in scikit-learn which implements this nicely, it's called sklearn.preprocessing.LabelEncoder:

from sklearn.preprocessing import LabelEncoder
y2 = ["A","1","4","F","A","1","4","F"]
lb = LabelEncoder()
y = lb.fit_transform(y2)
# y is now: array([2, 0, 1, 3, 2, 0, 1, 3])

In order to get back to your original labels (e.g. after classifying unseen data using SVC), use the inverse_transform of LabelEncoder to restore the string labels:

lb.inverse_transform(y)
# => array(['A', '1', '4', 'F', 'A', '1', '4', 'F'], dtype='|S1')
like image 172
Matt Avatar answered Oct 01 '22 09:10

Matt