Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running get_dummies on several DataFrame columns?

How can one idiomatically run a function like get_dummies, which expects a single column and returns several, on multiple DataFrame columns?

like image 727
Emre Avatar asked Jun 08 '14 19:06

Emre


People also ask

How do I create a dummy variable in multiple columns in Python?

For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) . If you have multiple categorical variables you simply add every variable name as a string to the list!

What is the difference between OneHotEncoder and Get_dummies?

(1) The get_dummies can't handle the unknown category during the transformation natively. You have to apply some techniques to handle it. But it is not efficient. On the other hand, OneHotEncoder will natively handle unknown categories.

What does the Get_dummies () function in pandas do?

get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.


1 Answers

With pandas 0.19, you can do that in a single line :

pd.get_dummies(data=df, columns=['A', 'B']) 

Columns specifies where to do the One Hot Encoding.

>>> df    A  B  C 0  a  c  1 1  b  c  2 2  a  b  3  >>> pd.get_dummies(data=df, columns=['A', 'B'])    C  A_a  A_b  B_b  B_c 0  1  1.0  0.0  0.0  1.0 1  2  0.0  1.0  0.0  1.0 2  3  1.0  0.0  1.0  0.0 
like image 179
mxdbld Avatar answered Sep 23 '22 14:09

mxdbld