Using OrdinalEncoder to transform categorical values

Tags:

scikit-learn

I have a dataset that has many columns

No  Name  Sex  Blood  Grade  Height  Study
1   Tom   M    O      56     160     Math
2   Harry M    A      76     192     Math
3   John  M    A      45     178     English
4   Nancy F    B      78     157     Biology
5   Mike  M    O      79     167     Math
6   Kate  F    AB     66     156     English
7   Mary  F    O      99     166     Science

I want to change it to be something like that

No  Name  Sex  Blood  Grade  Height  Study
1   Tom   0    0      56     160     0
2   Harry 0    1      76     192     0
3   John  0    1      45     178     1
4   Nancy 1    2      78     157     2
5   Mike  0    0      79     167     0
6   Kate  1    3      66     156     1
7   Mary  0    0      99     166     3

I know there is a libabrary can do that which is

from sklearn.preprocessing import OrdinalEncoder

I tried this but it did not work

enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])

can anyone help me finding what i am doing wrong and how to that?

Thanks

891

asked Jun 08 '19 02:06

2 Answers

You were almost there !

Basically the fit method, prepare the encoder (fit on your data i.e. prepare the mapping) but don't transform the data.

You have to call transform to transform the data , or use fit_transform which fit and transform the same data.

enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])
df[["Sex","Blood", "Study"]] = enc.transform(df[["Sex","Blood", "Study"]])

or directly

enc = OrdinalEncoder()
df[["Sex","Blood", "Study"]] = enc.fit_transform(df[["Sex","Blood", "Study"]])

Note: The values won't be the one that you provided, since internally the fit method use numpy.unique which gives result sorted in alphabetic order and not by order of appearance.

As you can see from enc.categories_

[array(['F', 'M'], dtype=object),
 array(['A', 'AB', 'B', 'O'], dtype=object),
 array(['Biology', 'English', 'Math', 'Science'], dtype=object)]```

Each value in the array is encoded by it's position. (F will be encoded as 0 , M as 1)

answered Nov 15 '22 20:11

I think it is important to point out that this is not an example for an ordinal encoding of variables. Sex, Blood and Study should all not have an ordinal scale (and was also not suggested by the person, who asked the question). Ordinal data has a ranking (see e.g. https://en.wikipedia.org/wiki/Ordinal_data) Those examples here do not have a ranking.

In the case that your variable is a target variable you can use the LabelEncoder.(https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)

Then you can do something like:

from sklearn.preprocessing import LabelEncoder

for col in ["Sex","Blood", "Study"]:
    df[col] = LabelEncoder().fit_transform(df[col])

If your variables are features you should use the Ordinalencoder for accomplishing this. (See comments to my answer).

The naming for the Ordinalencoder is quite unfortunate as "ordinal" is seen from a mathematical and not a statistical naming perspective.

More on the difference between ordinal- and labelencoder in sklearn: https://datascience.stackexchange.com/questions/39317/difference-between-ordinalencoder-and-labelencoder

answered Nov 15 '22 20:11

Createdd

Related questions
                            
                                How can I combine range() functions
                            
                                3 Different issues with ttk treeviews in python
                            
                                Custom attributes for Flask WTForms
                            
                                Python List to PostgreSQL Array
                            
                                UnboundLocalError: local variable 'L' referenced before assignment Python [duplicate]
                            
                                What is the practical application of bool() in Python?
                            
                                TypeError at / __init__() takes exactly 1 argument (2 given)
                            
                                Python ValueError: No JSON object could be decoded
                            
                                How to add a background image into pygame?
                            
                                Get last three digits of an integer
                            
                                How do I do line continuation with a long regex? [duplicate]
                            
                                matplotlib - making labels for violin plots
                            
                                Can't run pip: UnicodeDecodeError
                            
                                How to merge pandas value_counts() to dataframe or use it to subset a dataframe
                            
                                How to assign member variables temporarily?
                            
                                pandas group by ALL functionality?
                            
                                How do you decode one-hot labels in Tensorflow?
                            
                                What is itertools.groupby() used for?
                            
                                Tabula extract tables by area coordinates
                            
                                python 3.5 -> 3.6 Tablib TypeError: cell() missing 1 required positional argument: 'column'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using OrdinalEncoder to transform categorical values

Tags:

python

scikit-learn

asmgx

People also ask

2 Answers

abcdaire

Createdd

Recent Activity

Donate For Us