I have 2 columns:
When I execute pd.get_dummies()
on the above 2 columns, only 'Sex' is getting encoded into 2 columns. But 'Class' is not converted by get_dummies function.
I want 'Class' to be converted into 10 dummy columns as well, similar to One Hot Encoding.
Is this expected behavior? Is there an workaround?
Pandas uses the object data type to indicate categorical variables/columns because there are categorical (non-numerical) columns and we need to transform them. For this, we will implement get_dummies.
We frequently call these 0/1 variables “dummy” variables, but they are also sometimes called indicator variables. In machine learning, this is also sometimes referred to as “one-hot” encoding of categorical data. Now that you understand what dummy variables are, let’s talk about the Pandas get_dummies function.
The Pandas get dummies function, pd.get_dummies (), allows you to easily one-hot encode your categorical data. In this tutorial, you’ll learn how to use the Pandas get_dummies function works and how to customize it. One-hot encoding is a common preprocessing step for categorical data in machine learning.
One-hot encoding converts a column into n variables, while dummy encoding creates n-1 variables. However, Pandas by default will one-hot encode your data. This can be modified by using the drop_first parameter. To learn more about related topics, check out the tutorials below:
You can convert values to strings:
df1 = pd.get_dummies(df.astype(str))
If you don't want to convert your data, you can use 'columns' argument in get_dummies. Here is quick walkthrough:
Here is the data frame reproduced per your description:
sex_labels = ['male', 'female']
sex_col = [sex_labels[i%2] for i in range(10)]
class_col = [i for i in range(10)]
df = pd.DataFrame({'sex':sex_cols, 'class':class_col})
df.sex = pd.Categorical(df.sex)
The dtypes are:
print(df.dtypes)
sex category
class int64
dtype: object
Apply get_dummies:
df = pd.get_dummies(df, columns=['sex', 'class'])
Verify:
print(df.columns)
Output:
Index(['sex_female', 'sex_male', 'class_0',
'class_1','class_2','class_3','class_4','class_5',
'class_6','class_7','class_8','class_9'],dtype='object')
Per the docs at, https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.get_dummies.html,
If columns is None then all the columns with object or category dtype will be converted
This is the reason you only see dummies for sex column and not for class.
Hope this helps. Happy learning!
Note: Tested with pandas version '0.25.2'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With