I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc.
>>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]}) >>> from sklearn.preprocessing import OneHotEncoder >>> encoder = OneHotEncoder() >>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])
>>> encoder.get_feature_names() >>> array(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0', 'x1_60.0', 'x1_75.0'], dtype=object)
Is there a way to tell OneHotEncoder
to create the feature names in such a way that the column name is added at the beginning, something like - Sex_female, AgeGroup_15.0 etc, similar to what Pandas get_dummies()
does.
OneHotEncoder. Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature.
one hot encoder would return a 2d array of size data_length x num_categories . You cannot assign to a single column df['Profession'] .
Encode categorical features as a one-hot numeric array. By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually.
(1) The get_dummies can't handle the unknown category during the transformation natively. You have to apply some techniques to handle it. But it is not efficient. On the other hand, OneHotEncoder will natively handle unknown categories.
You can pass the list with original column names to get_feature_names
:
encoder.get_feature_names(['Sex', 'AgeGroup'])
will return:
['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15', 'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75']
column_name = encoder.get_feature_names(['Sex', 'AgeGroup']) one_hot_encoded_frame = pd.DataFrame(train_X_encoded, columns= column_name)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With