Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying the order of encoding in Ordinal Encoder

I'm using OrdinalEncoder, and I cannot find how to specify the encoding order. I mean that I have categories like "bad", "average", "good" which naturally have an order. But I want to specify that order, since the encoder cannot know itself the meaning of categories. Indeed, with categories='auto', some categories are encoded in wrong direction with respect to some others and I do not want this because I know, at least for some of them, if the correlation is positive or negative.

But specifying the categories results in an error during fitting:

'OrdinalEncoder' object has no attribute 'handle_unknown'.

If I do not specify the categories, fitting process goes well, and I do not understand why (the attribute "categories_", after fitting, shows me the same categories I enter by hand when I try to specify them).

I specify the categories as a list of lists. Here what happens without specifying categories.

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

df = pd.DataFrame(np.array([['a','a','a'], ['b','c','c']]).transpose())
oE = OrdinalEncoder(categories='auto')
oE.fit(df)

print(oE.categories_)

Resulting in: [array(['a'], dtype=object), array(['b', 'c'], dtype=object)]

Specifying categories explicitely:

df = pd.DataFrame(np.array([['a','a','a'], ['b','c','c']]).transpose())
oE = OrdinalEncoder(categories=[['a'], ['b', 'c']])
oE.fit(df)

The result is this error:

Traceback (most recent call last):

File "", line 3, in oE.fit(df)

File "/home/alessio/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 774, in fit self._fit(X)

File "/home/alessio/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 85, in _fit if self.handle_unknown == 'error':

AttributeError: 'OrdinalEncoder' object has no attribute 'handle_unknown'

like image 388
Alessio Giberti Avatar asked Nov 14 '18 07:11

Alessio Giberti


People also ask

How do you do ordinal encoding?

A numerical variable can be converted to an ordinal variable by dividing the range of the numerical variable into bins and assigning values to each bin. For example, a numerical variable between 1 and 10 can be divided into an ordinal variable with 5 labels with an ordinal relationship: 1-2, 3-4, 5-6, 7-8, 9-10.

When the categorical feature is ordinal which categorical encoding should be used?

Binary encoding is a combination of Hash encoding and one-hot encoding. In this encoding scheme, the categorical feature is first converted into numerical using an ordinal encoder. Then the numbers are transformed in the binary number. After that binary value is split into different columns.

Is ordinal encoding same as label encoding?

The only different is that LabelEncoder returned an array, while OrdinalEncoder returned each element inside an array of arrays. Does anybody know the difference? Thank you in advance! Before you can post on Kaggle, you'll need to create an account or log in.


Video Answer


1 Answers

I had the same problem. This is bug in scikit-learn, already fixed and added to version 0.20.1, which is still not released. https://github.com/scikit-learn/scikit-learn/issues/12365

I solved it temporarily by copying fixed _encoders.py to my project and using.

from _encoders import OrdinalEncoder
like image 129
Tomas P Avatar answered Oct 23 '22 11:10

Tomas P