Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LabelEncoder specify classes in DataFrame

I’m applying a LabelEncoder to a pandas DataFrame, df

Feat1  Feat2  Feat3  Feat4  Feat5
  A      A      A      A      E
  B      B      C      C      E
  C      D      C      C      E
  D      A      C      D      E

I'm applying a label encoder to a dataframe like this -

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
intIndexed = df.apply(le.fit_transform)

This is how the labels are mapped

A = 0
B = 1
C = 2
D = 3
E = 0

I'm guessing that E isn't given the value of 4 as it doesn't appear in any other column other than Feat 5 .

I want E to be given the value of 4 - but don't know how to do this in a DataFrame.

like image 813
gbhrea Avatar asked Aug 11 '16 10:08

gbhrea


People also ask

What is the LabelEncoder () method?

LabelEncoder[source] Encode target labels with value between 0 and n_classes-1. This transformer should be used to encode target values, i.e. y , and not the input X . Read more in the User Guide. New in version 0.12.

What is LabelEncoder () in Python?

Label Encoder: Label Encoding in Python can be achieved using Sklearn Library. Sklearn provides a very efficient tool for encoding the levels of categorical features into numeric values. LabelEncoder encode labels with a value between 0 and n_classes-1 where n is the number of distinct labels.

Which is better hot encoding or label encoding?

Label encoder is used when: The number of categories is quite large as one-hot encoding can lead to high memory consumption.

How do I reverse LabelEncoder?

To reverse the process of LabelEncoder , it has a function provided specifically for the task called inverse_transform.


2 Answers

You could fit the label encoder and later transform the labels to their normalized encoding as follows:

In [4]: from sklearn import preprocessing
   ...: import numpy as np

In [5]: le = preprocessing.LabelEncoder()

In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()

In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']

In [8]: df.apply(le.transform)
Out[8]: 
   Feat1  Feat2  Feat3  Feat4  Feat5
0      0      0      0      0      4
1      1      1      2      2      4
2      2      3      2      2      4
3      3      0      2      3      4

One way to specify labels by default would be:

In [9]: labels = ['A', 'B', 'C', 'D', 'E']

In [10]: enc = le.fit(labels)

In [11]: enc.classes_                       # sorts the labels in alphabetical order
Out[11]: 
array(['A', 'B', 'C', 'D', 'E'], 
      dtype='<U1')

In [12]: enc.transform('E')
Out[12]: 4
like image 74
Nickil Maveli Avatar answered Nov 08 '22 20:11

Nickil Maveli


You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.

df[columnName] = LabelEncoder().fit_transform(df[columnName])
like image 35
Anvesh_vs Avatar answered Nov 08 '22 19:11

Anvesh_vs