Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

label-encoder encoding missing values

I am using the label encoder to convert categorical data into numeric values.

How does LabelEncoder handle missing values?

from sklearn.preprocessing import LabelEncoder import pandas as pd import numpy as np a = pd.DataFrame(['A','B','C',np.nan,'D','A']) le = LabelEncoder() le.fit_transform(a) 

Output:

array([1, 2, 3, 0, 4, 1]) 

For the above example, label encoder changed NaN values to a category. How would I know which category represents missing values?

like image 861
saurabh agarwal Avatar asked Apr 23 '16 08:04

saurabh agarwal


People also ask

How do you handle unseen labels in label encoding?

There are two options to solve this error: Re-train the model and label encoder on the new data set. Add an "Unseen" value when fitting your label encoder and apply new values this "Unseen" value when scoring.

What is the LabelEncoder () method?

LabelEncoder[source] Encode target labels with value between 0 and n_classes-1. This transformer should be used to encode target values, i.e. y , and not the input X . Read more in the User Guide. New in version 0.12.

Why is LabelEncoder used?

Label Encoder: Sklearn provides a very efficient tool for encoding the levels of categorical features into numeric values. LabelEncoder encode labels with a value between 0 and n_classes-1 where n is the number of distinct labels. If a label repeats it assigns the same value to as assigned earlier.


1 Answers

Don't use LabelEncoder with missing values. I don't know which version of scikit-learn you're using, but in 0.17.1 your code raises TypeError: unorderable types: str() > float().

As you can see in the source it uses numpy.unique against the data to encode, which raises TypeError if missing values are found. If you want to encode missing values, first change its type to a string:

a[pd.isnull(a)]  = 'NaN' 
like image 90
dukebody Avatar answered Sep 16 '22 21:09

dukebody