How can one idiomatically run a function like get_dummies
, which expects a single column and returns several, on multiple DataFrame columns?
For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) . If you have multiple categorical variables you simply add every variable name as a string to the list!
(1) The get_dummies can't handle the unknown category during the transformation natively. You have to apply some techniques to handle it. But it is not efficient. On the other hand, OneHotEncoder will natively handle unknown categories.
get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.
With pandas 0.19, you can do that in a single line :
pd.get_dummies(data=df, columns=['A', 'B'])
Columns
specifies where to do the One Hot Encoding.
>>> df A B C 0 a c 1 1 b c 2 2 a b 3 >>> pd.get_dummies(data=df, columns=['A', 'B']) C A_a A_b B_b B_c 0 1 1.0 0.0 0.0 1.0 1 2 0.0 1.0 0.0 1.0 2 3 1.0 0.0 1.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With