Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Feature names from OneHotEncoder

I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc.

>>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]})  >>> from sklearn.preprocessing import OneHotEncoder >>> encoder = OneHotEncoder() >>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']]) 
>>> encoder.get_feature_names() >>> array(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0',        'x1_60.0', 'x1_75.0'], dtype=object) 

Is there a way to tell OneHotEncoder to create the feature names in such a way that the column name is added at the beginning, something like - Sex_female, AgeGroup_15.0 etc, similar to what Pandas get_dummies() does.

like image 643
Supratim Haldar Avatar asked Feb 07 '19 10:02

Supratim Haldar


People also ask

What is categorical features in OneHotEncoder?

OneHotEncoder. Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature.

What does OneHotEncoder return?

one hot encoder would return a 2d array of size data_length x num_categories . You cannot assign to a single column df['Profession'] .

How do you define OneHotEncoder?

Encode categorical features as a one-hot numeric array. By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually.

What is the difference between OneHotEncoder and Get_dummies?

(1) The get_dummies can't handle the unknown category during the transformation natively. You have to apply some techniques to handle it. But it is not efficient. On the other hand, OneHotEncoder will natively handle unknown categories.


2 Answers

You can pass the list with original column names to get_feature_names:

encoder.get_feature_names(['Sex', 'AgeGroup']) 

will return:

['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',  'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75'] 
like image 175
kabochkov Avatar answered Sep 17 '22 19:09

kabochkov


column_name = encoder.get_feature_names(['Sex', 'AgeGroup']) one_hot_encoded_frame =  pd.DataFrame(train_X_encoded, columns= column_name) 
like image 43
Nursnaaz Avatar answered Sep 20 '22 19:09

Nursnaaz