How to create dummy variables using pandas with reference to one value?

Question

test = {'ngrp' : ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx']}
test = pd.DataFrame(test)
dummy = pd.get_dummies(test['ngrp'], drop_first = True)

This gives me:

   Brooklyn  Manhattan  Queens  Staten Island
0         0          1       0              0
1         1          0       0              0
2         0          0       1              0
3         0          0       0              1
4         0          0       0              0

I will get Bronx as my reference level (because that is what gets dropped), how do I change it to specify that Manhattan should be my reference level? My expected output is

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1

cs95 · Accepted Answer

get_dummies sorts your values (lexicographically) and then creates dummies. That's why you don't see "Bronx" in your initial result; its because it was the first sorted value in your column, so it was dropped first.

To avoid the behavior you see, enforce the ordering to be on a "first-seen" basis (i.e., convert it to an ordered categorical).

pd.get_dummies(
    pd.Categorical(test['ngrp'], categories=test['ngrp'].unique(), ordered=True), 
    drop_first=True)                                       

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1

Of course, this has the side effect of returning dummies with categorical column names as the result, but that's almost never an issue.

How to create dummy variables using pandas with reference to one value?

Tags:

python

pandas

dataframe

dummy-variable

John peter

1 Answers

cs95

Recent Activity

Donate For Us

How to create dummy variables using pandas with reference to one value?

Tags:

python

pandas

dataframe

dummy-variable

John peter

1 Answers

cs95

Related questions

Recent Activity

Donate For Us