I'm using OrdinalEncoder, and I cannot find how to specify the encoding order. I mean that I have categories like "bad", "average", "good" which naturally have an order. But I want to specify that order, since the encoder cannot know itself the meaning of categories. Indeed, with categories='auto', some categories are encoded in wrong direction with respect to some others and I do not want this because I know, at least for some of them, if the correlation is positive or negative.
But specifying the categories results in an error during fitting:
'OrdinalEncoder' object has no attribute 'handle_unknown'.
If I do not specify the categories, fitting process goes well, and I do not understand why (the attribute "categories_", after fitting, shows me the same categories I enter by hand when I try to specify them).
I specify the categories as a list of lists. Here what happens without specifying categories.
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
df = pd.DataFrame(np.array([['a','a','a'], ['b','c','c']]).transpose())
oE = OrdinalEncoder(categories='auto')
oE.fit(df)
print(oE.categories_)
Resulting in: [array(['a'], dtype=object), array(['b', 'c'], dtype=object)]
Specifying categories explicitely:
df = pd.DataFrame(np.array([['a','a','a'], ['b','c','c']]).transpose())
oE = OrdinalEncoder(categories=[['a'], ['b', 'c']])
oE.fit(df)
The result is this error:
Traceback (most recent call last):
File "", line 3, in oE.fit(df)
File "/home/alessio/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 774, in fit self._fit(X)
File "/home/alessio/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 85, in _fit if self.handle_unknown == 'error':
AttributeError: 'OrdinalEncoder' object has no attribute 'handle_unknown'
A numerical variable can be converted to an ordinal variable by dividing the range of the numerical variable into bins and assigning values to each bin. For example, a numerical variable between 1 and 10 can be divided into an ordinal variable with 5 labels with an ordinal relationship: 1-2, 3-4, 5-6, 7-8, 9-10.
Binary encoding is a combination of Hash encoding and one-hot encoding. In this encoding scheme, the categorical feature is first converted into numerical using an ordinal encoder. Then the numbers are transformed in the binary number. After that binary value is split into different columns.
The only different is that LabelEncoder returned an array, while OrdinalEncoder returned each element inside an array of arrays. Does anybody know the difference? Thank you in advance! Before you can post on Kaggle, you'll need to create an account or log in.
I had the same problem. This is bug in scikit-learn, already fixed and added to version 0.20.1, which is still not released. https://github.com/scikit-learn/scikit-learn/issues/12365
I solved it temporarily by copying fixed _encoders.py
to my project and using.
from _encoders import OrdinalEncoder
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With