Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create dummies for certain columns with pandas.get_dummies()

Tags:

python

pandas

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df), all columns turned into dummies.

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l'.

like image 351
Jack Avatar asked May 17 '16 00:05

Jack


People also ask

How do I create a dummy variable in multiple columns in Python?

For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) . If you have multiple categorical variables you simply add every variable name as a string to the list!

What does the Get_dummies () function in pandas do?

get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.


4 Answers

It can be done without concatenation, using get_dummies() with required parameters

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0
like image 160
knagaev Avatar answered Oct 06 '22 03:10

knagaev


Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)
like image 41
Patric Fulop Avatar answered Oct 06 '22 02:10

Patric Fulop


Just select the two columns you want to .get_dummies() for - column names indicate source column and variable label represented as binary variable, and pd.concat() the original columns you want unchanged:

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3
like image 28
Stefan Avatar answered Oct 06 '22 02:10

Stefan


  • The other answers are great for the specific example in the OP
  • This answer is for cases where there may be many columns, and it's too cumbersome to type out all the column names
  • This is a non-exhaustive solution to specifying many different columns to get_dummies while excluding some columns.
  • Using the built-in filter() function on df.columns is also an option.
  • pd.get_dummies only works on columns with an object dtype when columns=None.
    • Another potential option is to set only columns to be transformed with the object dtype, and make sure the columns that shouldn't be transformed, are not object dtype.
  • Using set(), as shown in this answer, is yet another option.
import pandas as pd
import string  # for data
import numpy as np

# create test data
np.random.seed(15)
df = pd.DataFrame(np.random.randint(1, 4, size=(5, 10)), columns=list(string.ascii_uppercase[:10]))

# display(df)
   A  B  C  D  E  F  G  H  I  J
0  1  2  1  2  1  1  2  3  2  2
1  2  1  3  3  1  2  2  1  2  1
2  2  3  1  3  2  2  1  2  3  3
3  3  2  1  2  3  2  3  1  3  1
4  1  1  1  3  3  1  2  1  2  1

Option 1

  • If the excluded columns are fewer than the included columns, specify the columns to remove, and then use a list comprehension to remove them from the list being passed to the columns= parameter.
# columns not to transform
not_cols = ['C', 'G']

# get dummies
df_dummies = pd.get_dummies(data=df, columns=[col for col in df.columns if col not in not_cols])

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Option 2

  • If the columns to remove are at the beginning or end, slice df.columns
df_dummies = pd.get_dummies(data=df, columns=df.columns[2:])

   A  B  C_1  C_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  G_1  G_2  G_3  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    1    0    1    0    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  2  1    0    1    0    1    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0
2  2  3    1    0    0    1    0    1    0    0    1    1    0    0    0    1    0    0    1    0    0    1
3  3  2    1    0    1    0    0    0    1    0    1    0    0    1    1    0    0    0    1    1    0    0
4  1  1    1    0    0    1    0    0    1    1    0    0    1    0    1    0    0    1    0    1    0    0

Option 3

  • Specify slices and then concat the excluded columns to the dummies
    • Uses pd.concat, similar to this answer, but with more columns.
  • np.r_ translates slice objects to concatenate
slices = np.r_[slice(0, 2), slice(3, 6), slice(7, 10)]
excluded = [2, 6]

df_dummies = pd.concat([df.iloc[:, excluded], pd.get_dummies(data=df.iloc[:, slices].astype(object))], axis=1)

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0
like image 44
Trenton McKinney Avatar answered Oct 06 '22 01:10

Trenton McKinney