Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn LabelEncoder throws TypeError in sort

I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked".

encoder = preprocessing.LabelEncoder()
features["Sex"] = encoder.fit_transform(features["Sex"])
features["Embarked"] = encoder.fit_transform(features["Embarked"])

This is the error I got

Traceback (most recent call last):
  File "../src/script.py", line 20, in <module>
    features["Embarked"] = encoder.fit_transform(features["Embarked"])
  File "/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 131, in fit_transform
    self.classes_, y = np.unique(y, return_inverse=True)
  File "/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py", line 211, in unique
    perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
TypeError: '>' not supported between instances of 'str' and 'float'

Description of dataset

like image 944
Bhavani Ravi Avatar asked May 13 '17 18:05

Bhavani Ravi


People also ask

What is the LabelEncoder () method?

LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. Fit label encoder. Fit label encoder and return encoded labels.

How does LabelEncoder work Sklearn?

Label Encoder: Sklearn provides a very efficient tool for encoding the levels of categorical features into numeric values. LabelEncoder encode labels with a value between 0 and n_classes-1 where n is the number of distinct labels. If a label repeats it assigns the same value to as assigned earlier.

Is label encoding the same as ordinal encoding?

LabelEncoder should be used to encode target values, i.e. y, and not the input X. Ordinal encoding should be used for ordinal variables (where order matters, like cold , warm , hot ); vs Label encoding should be used for non-ordinal (aka nominal) variables (where order doesn't matter, like blonde , brunette )

How do you reverse a LabelEncoder in Python?

To reverse the process of LabelEncoder , it has a function provided specifically for the task called inverse_transform.


2 Answers

I solved it myself. The problem was that the particular feature had NaN values. Replacing it with a numerical value it will still throw an error since it is of different datatypes. So I replaced it with a character value

 features["Embarked"] = encoder.fit_transform(features["Embarked"].fillna('0'))
like image 160
Bhavani Ravi Avatar answered Sep 23 '22 20:09

Bhavani Ravi


Try this function, you’ll need to pass a Pandas Dataframe. It will look at the type of your column and encode. So you won’t need to even bother checking the types yourself.

def encoder(data):
'''Map the categorical variables to numbers to work with scikit learn'''
for col in data.columns:
    if data.dtypes[col] == "object":
        le = preprocessing.LabelEncoder()
        le.fit(data[col])
        data[col] = le.transform(data[col])
return data
like image 27
Giovanni Bruner Avatar answered Sep 21 '22 20:09

Giovanni Bruner