Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OneHotEncoder categorical_features deprecated, how to transform specific column

Tags:

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:

Country     |    Age        -------------------------- Germany     |    23 Spain       |    25 Germany     |    24 Italy       |    30  

I have to encode the Country column like

0     |    1     |     2     |       3 -------------------------------------- 1     |    0     |     0     |      23 0     |    1     |     0     |      25 1     |    0     |     0     |      24  0     |    0     |     1     |      30 

I succeed to get the desire transformation via using OneHotEncoder as

#Encoding the categorical data from sklearn.preprocessing import LabelEncoder  labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0])  #we are dummy encoding as the machine learning algorithms will be #confused with the values like Spain > Germany > France from sklearn.preprocessing import OneHotEncoder  onehotencoder = OneHotEncoder(categorical_features=[0]) X = onehotencoder.fit_transform(X).toarray() 

Now I'm getting the depreciation message to use categories='auto'. If I do so the transformation is being done for the all independent columns like country, age, salary etc.

How to achieve the transformation on the dataset 0th column only?

like image 948
Hassaan Avatar asked Jan 24 '19 11:01

Hassaan


People also ask

What is categorical features in OneHotEncoder?

OneHotEncoder. Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature.

How do I use OneHotEncoder in Python?

We can load this using the load_dataset() function: # One-hot encoding a single column from sklearn. preprocessing import OneHotEncoder from seaborn import load_dataset df = load_dataset('penguins') ohe = OneHotEncoder() transformed = ohe. fit_transform(df[['island']]) print(transformed.

What does one-hot encoder do?

One Hot Encoding is a common way of preprocessing categorical features for machine learning models. This type of encoding creates a new binary feature for each possible category and assigns a value of 1 to the feature of each sample that corresponds to its original category.


1 Answers

There is actually 2 warnings :

FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.

and the second :

The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)

In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use "categories='auto'". The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first. Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.

Here is the equivalent code for your case :

from sklearn.compose import ColumnTransformer  ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step ct.fit_transform(X)     

See also : ColumnTransformer documentation

For the above example;

Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)

from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer #Encode Country Column labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X) 
like image 83
CoMartel Avatar answered Oct 14 '22 19:10

CoMartel