label-encoder encoding missing values

Tags:

I am using the label encoder to convert categorical data into numeric values.

How does LabelEncoder handle missing values?

from sklearn.preprocessing import LabelEncoder import pandas as pd import numpy as np a = pd.DataFrame(['A','B','C',np.nan,'D','A']) le = LabelEncoder() le.fit_transform(a)

Output:

array([1, 2, 3, 0, 4, 1])

For the above example, label encoder changed NaN values to a category. How would I know which category represents missing values?

861

asked Apr 23 '16 08:04

saurabh agarwal

1 Answers

Don't use LabelEncoder with missing values. I don't know which version of scikit-learn you're using, but in 0.17.1 your code raises TypeError: unorderable types: str() > float().

As you can see in the source it uses numpy.unique against the data to encode, which raises TypeError if missing values are found. If you want to encode missing values, first change its type to a string:

a[pd.isnull(a)]  = 'NaN'

answered Sep 16 '22 21:09

dukebody

Related questions
                            
                                ValueError: Tensor must be from the same graph as Tensor with Bidirectinal RNN in Tensorflow
                            
                                How to merge 2 ordered dictionaries in python?
                            
                                MemoryError when I merge two Pandas data frames
                            
                                What is the best way to remove columns in pandas
                            
                                Issues installing PyTorch 1.4 - "No matching distribution found for torch===1.4.0"
                            
                                Django setting for default template tag output when variable is None?
                            
                                How to point pip at a Mercurial branch?
                            
                                remove last STDOUT line in Python
                            
                                Can't load relative config file using ConfigParser from sub-directory
                            
                                Extract row with maximum value in a group pandas dataframe
                            
                                AttributeError: 'str' object has no attribute 'strftime'
                            
                                Replace the zeros in a NumPy integer array with nan
                            
                                How to VPN/Proxy connect in Python?
                            
                                Does setting numpy arrays to None free memory?
                            
                                Error while installing GDAL
                            
                                Why does calling Python's 'magic method' not do type conversion like it would for the corresponding operator?
                            
                                Python: 'super' object has no attribute 'attribute_name'
                            
                                Why does str.split not take keyword arguments?
                            
                                Error importing hashlib with python 2.7 but not with 2.6
                            
                                Validate and format JSON files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

label-encoder encoding missing values

Tags:

python

pandas

scikit-learn

saurabh agarwal

People also ask

1 Answers

dukebody

Recent Activity

Donate For Us