I’m applying a LabelEncoder to a pandas DataFrame, <code>df</code> <pre class="prettyprint"><code>Feat1 Feat2 Feat3 Feat4 Feat5 A A A A E B B C C E C D C C E D A C D E </code></pre> I'm applying a label encoder to a dataframe like this - <pre class="prettyprint"><code>from sklearn import preprocessing le = preprocessing.LabelEncoder() intIndexed = df.apply(le.fit_transform) </code></pre> This is how the labels are mapped <pre class="prettyprint"><code>A = 0 B = 1 C = 2 D = 3 E = 0 </code></pre> I'm guessing that <code>E</code> isn't given the value of <code>4</code> as it doesn't appear in any other column other than <code>Feat 5</code> . I want <code>E</code> to be given the value of <code>4</code> - but don't know how to do this in a DataFrame.

You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame. <pre class="prettyprint"><code>df[columnName] = LabelEncoder().fit_transform(df[columnName]) </code></pre>

LabelEncoder specify classes in DataFrame

Tags:

python

pandas

machine-learning

scikit-learn

I’m applying a LabelEncoder to a pandas DataFrame, df

Feat1  Feat2  Feat3  Feat4  Feat5
  A      A      A      A      E
  B      B      C      C      E
  C      D      C      C      E
  D      A      C      D      E

I'm applying a label encoder to a dataframe like this -

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
intIndexed = df.apply(le.fit_transform)

This is how the labels are mapped

A = 0
B = 1
C = 2
D = 3
E = 0

I'm guessing that E isn't given the value of 4 as it doesn't appear in any other column other than Feat 5 .

I want E to be given the value of 4 - but don't know how to do this in a DataFrame.

813

asked Aug 11 '16 10:08

gbhrea

2 Answers

You could fit the label encoder and later transform the labels to their normalized encoding as follows:

In [4]: from sklearn import preprocessing
   ...: import numpy as np

In [5]: le = preprocessing.LabelEncoder()

In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()

In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']

In [8]: df.apply(le.transform)
Out[8]: 
   Feat1  Feat2  Feat3  Feat4  Feat5
0      0      0      0      0      4
1      1      1      2      2      4
2      2      3      2      2      4
3      3      0      2      3      4

One way to specify labels by default would be:

In [9]: labels = ['A', 'B', 'C', 'D', 'E']

In [10]: enc = le.fit(labels)

In [11]: enc.classes_                       # sorts the labels in alphabetical order
Out[11]: 
array(['A', 'B', 'C', 'D', 'E'], 
      dtype='<U1')

In [12]: enc.transform('E')
Out[12]: 4

answered Nov 08 '22 20:11

Nickil Maveli

You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.

df[columnName] = LabelEncoder().fit_transform(df[columnName])

answered Nov 08 '22 19:11

Anvesh_vs

Related questions
                            
                                From JPG to b64encode to cv2.imread()
                            
                                Python library for data scaling, centering and Box-Cox transformation
                            
                                Generate x+xx+xxx+xxxx ... for a given integer (for 4 -> 4+44+444...)
                            
                                AttributeError: 'RegexURLPattern' object has no attribute '_callback'
                            
                                How can I use versioning in S3 with boto3?
                            
                                Can I dynamically pass a list of arguments to a python function with predefined arguments? [duplicate]
                            
                                Django Allauth Error: "'socialaccount' is not a registered tag library"
                            
                                Matplotlib animation inside your own GUI
                            
                                Easiest way to generate random int64 array in numpy?
                            
                                Setting kivy window size not working
                            
                                TypeError: 'numpy.float64' object is not callable?
                            
                                How to get Windows window names with ctypes in python
                            
                                No module named bidi.algorithm
                            
                                Hard limiting / threshold activation function in TensorFlow
                            
                                python lambda list filtering with multiple conditions
                            
                                Add/subtract dataframes with different column labels
                            
                                Why bisect slower than sort
                            
                                print UTF-8 character in Python 2.7
                            
                                output a dataframe to a json array
                            
                                Scrapy Images Downloading

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With