Sklearn trying to convert string list to floats

Question

I am trying to make a sklearn.svm.SVC(kernel="linear") algorithm work. My X is an array made with [misc.imread(each).flatten() for each in filenames] and my y2 is a part of a list made of strings such as ["A","1","4","F"..].

When I try to clf.fit(X,y2), sklearn tries to convert my string list into floats and fails, throwing ValueError: could not convert string to float. How can I solve this?

EDIT: Upgrading sklearn to 0.15 solved the problem.

Matt · Accepted Answer

There is a helper class in scikit-learn which implements this nicely, it's called sklearn.preprocessing.LabelEncoder:

from sklearn.preprocessing import LabelEncoder
y2 = ["A","1","4","F","A","1","4","F"]
lb = LabelEncoder()
y = lb.fit_transform(y2)
# y is now: array([2, 0, 1, 3, 2, 0, 1, 3])

In order to get back to your original labels (e.g. after classifying unseen data using SVC), use the inverse_transform of LabelEncoder to restore the string labels:

lb.inverse_transform(y)
# => array(['A', '1', '4', 'F', 'A', '1', '4', 'F'], dtype='|S1')

Sklearn trying to convert string list to floats

Tags:

python

numpy

scikit-learn

sikerbela

1 Answers

Matt

Recent Activity

Donate For Us

Sklearn trying to convert string list to floats

Tags:

python

numpy

scikit-learn

sikerbela

1 Answers

Matt

Related questions

Recent Activity

Donate For Us