Sklearn Label Encoding multiple columns pandas dataframe

Tags:

I try to encode a number of columns containing categorical data ("Yes" and "No") in a large pandas dataframe. The complete dataframe contains over 400 columns so I look for a way to encode all desired columns without having to encode them one by one. I use Scikit-learn LabelEncoder to encode the categorical data.

The first part of the dataframe does not have to be encoded, however I am looking for a method to encode all the desired columns containing categorical date directly without split and concatenate the dataframe.

To demonstrate my question I first tried to solve it on a small part of the dataframe. However get stuck at the last part where the data is fitted and transformed and get a ValueError: bad input shape (4,3). The code as I ran:

Click to copy

# Create a simple dataframe resembling large dataframe
    data = pd.DataFrame({'A': [1, 2, 3, 4],
                         'B': ["Yes", "No", "Yes", "Yes"],
                         'C': ["Yes", "No", "No", "Yes"],
                         'D': ["No", "Yes", "No", "Yes"]})


# Import required module
from sklearn.preprocessing import LabelEncoder

# Create an object of the label encoder class
labelencoder = LabelEncoder()

# Apply labelencoder object on columns
labelencoder.fit_transform(data.ix[:, 1:])   # First column does not need to be encoded

Complete error report:

Click to copy

labelencoder.fit_transform(data.ix[:, 1:])
Traceback (most recent call last):

  File "<ipython-input-47-b4986a719976>", line 1, in <module>
    labelencoder.fit_transform(data.ix[:, 1:])

  File "C:\Anaconda\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 129, in fit_transform
    y = column_or_1d(y, warn=True)

  File "C:\Anaconda\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 562, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))

ValueError: bad input shape (4, 3)

Does anyone know how to do this?

591

asked Jun 10 '17 14:06

HelloBlob

2 Answers

As the following code, you can encode the multiple columns by applying LabelEncoder to DataFrame. However, please note that we cannot obtain the classes information for all columns.

Click to copy

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({'A': [1, 2, 3, 4],
                   'B': ["Yes", "No", "Yes", "Yes"],
                   'C': ["Yes", "No", "No", "Yes"],
                   'D': ["No", "Yes", "No", "Yes"]})
print(df)
#    A    B    C    D
# 0  1  Yes  Yes   No
# 1  2   No   No  Yes
# 2  3  Yes   No   No
# 3  4  Yes  Yes  Yes

# LabelEncoder
le = LabelEncoder()

# apply "le.fit_transform"
df_encoded = df.apply(le.fit_transform)
print(df_encoded)
#    A  B  C  D
# 0  0  1  1  0
# 1  1  0  0  1
# 2  2  1  0  0
# 3  3  1  1  1

# Note: we cannot obtain the classes information for all columns.
print(le.classes_)
# ['No' 'Yes']

159

answered Sep 28 '22 05:09

Keiku

First, find out all the features with type object:

Click to copy

objList = all_data.select_dtypes(include = "object").columns
print (objList)

Now, to convert the above objList features into numeric type, you can use a forloop as given below:

Click to copy

#Label Encoding for object to numeric conversion
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

for feat in objList:
    df[feat] = le.fit_transform(df[feat].astype(str))

print (df.info())

Note that we are explicitly mentioning as type string in the forloop because if you remove that it throws an error.

answered Sep 28 '22 05:09

Darshan Jain

Related questions
                            
                                PyQt5: Keyboard shortcuts w/ QAction
                            
                                How to label and change the scale of Seaborn kdeplot's axes
                            
                                speech recognition python code not working
                            
                                Python HTML Encoding \xc2\xa0
                            
                                Replace all matches using re.findall()
                            
                                Python List object attribute 'append' is read-only
                            
                                Mock open() function used in a class method
                            
                                How to use pyinstaller?
                            
                                Python's json.load(sys.stdin) gets me u'...' instead of double quotes around Strings
                            
                                Why is a `for` over a Python list faster than over a Numpy array?
                            
                                Django annotate() error AttributeError: 'CharField' object has no attribute 'resolve_expression'
                            
                                Deprecated rolling window option in OLS from Pandas to Statsmodels
                            
                                Weighted correlation coefficient with pandas
                            
                                How to get odds-ratios and other related features with scikit-learn
                            
                                Pandas random sample with remove
                            
                                Is there a Python shortcut for an __init__ that simply sets properties? [duplicate]
                            
                                Is there a way to get access_key and secret_key from boto3? [duplicate]
                            
                                Get the last output of a dynamic_rnn in TensorFlow
                            
                                ffmpeg in python script
                            
                                Producing spectrogram from microphone

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn Label Encoding multiple columns pandas dataframe

Tags:

python

encoding

scikit-learn

HelloBlob

People also ask

2 Answers

Keiku

Darshan Jain

Recent Activity

Donate For Us