Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn changing string class label to int

I have a pandas dataframe and I'm trying to change the values in a given column which are represented by strings into integers. For instance:

df = index    fruit   quantity   price 
         0    apple          5    0.99
         1    apple          2    0.99
         2   orange          4    0.89
         4   banana          1    1.64
       ...
     10023     kiwi         10    0.92

I would like it to look at:

df = index    fruit   quantity   price 
         0        1          5    0.99
         1        1          2    0.99
         2        2          4    0.89
         4        3          1    1.64
       ...
     10023        5         10    0.92

I can do this using

df["fruit"] = df["fruit"].map({"apple": 1, "orange": 2,...})

which works if I have a small list to change, but I'm looking at a column with over 500 different labels. Is there any way of changing this from a string to a an int?

like image 516
Lukasz Avatar asked Feb 18 '17 21:02

Lukasz


1 Answers

You can use sklearn.preprocessing

from sklearn import preprocessing

le = preprocessing.LabelEncoder()
le.fit(df.fruit)
df['categorical_label'] = le.transform(df.fruit)

Transform labels back to original encoding.

le.inverse_transform(df['categorical_label'])
like image 163
Hugo Lemieux-Fournier Avatar answered Sep 23 '22 23:09

Hugo Lemieux-Fournier