Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to give column names after one-hot encoding with sklearn?

Here is my question, I hope someone can help me to figure it out..

To explain, there are more than 10 categorical columns in my data set and each of them has 200-300 categories. I want to convert them into binary values. For that I used first label encoder to convert string categories into numbers. The Label Encoder code and the output is shown below.

https://i.stack.imgur.com/MIVHV.png

After Label Encoder, I used One Hot Encoder From scikit-learn again and it is worked. BUT THE PROBLEM IS, I need column names after one hot encoder. For example, column A with categorical values before encoding. A = [1,2,3,4,..]

It should be like that after encoding,

A-1, A-2, A-3

Anyone know how to assign column names to (old column names -value name or number) after one hot encoding. Here is my one hot encoding and it's output;

https://i.stack.imgur.com/kgrNa.png

I need columns with name because I trained an ANN, but every time data comes up I cannot convert all past data again and again. So, I want to add just new ones every time. Thank anyway..

like image 762
Aditya Pratama Avatar asked May 28 '19 09:05

Aditya Pratama


People also ask

What is OneHotEncoder in Sklearn?

One-hot encoding is a process by which categorical data (such as nominal data) are converted into numerical features of a dataset. This is often a required preprocessing step since machine learning models require numerical data.

What is the LabelEncoder () method?

LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. Fit label encoder. Fit label encoder and return encoded labels.

What is the difference between OneHotEncoder and Get_dummies?

(1) The get_dummies can't handle the unknown category during the transformation natively. You have to apply some techniques to handle it. But it is not efficient. On the other hand, OneHotEncoder will natively handle unknown categories.


1 Answers

You can get the column names using .get_feature_names() attribute.

>>> ohenc.get_feature_names()
>>> x_cat_df.columns = ohenc.get_feature_names()

Detailed example is here.

Update

from Version 1.0, use get_feature_names_out

like image 52
Venkatachalam Avatar answered Sep 28 '22 09:09

Venkatachalam