how to get pandas get_dummies to emit N-1 variables to avoid collinearity?

Tags:

pandas.get_dummies emits a dummy variable per categorical value. Is there some automated, easy way to ask it to create only N-1 dummy variables? (just get rid of one "baseline" variable arbitrarily)?

Needed to avoid co-linearity in our dataset.

497

asked Jul 19 '15 05:07

ihadanny

2 Answers

Pandas version 0.18.0 implemented exactly what you're looking for: the drop_first option. Here's an example:

In [1]: import pandas as pd  In [2]: pd.__version__ Out[2]: u'0.18.1'  In [3]: s = pd.Series(list('abcbacb'))  In [4]: pd.get_dummies(s, drop_first=True) Out[4]:       b    c 0  0.0  0.0 1  1.0  0.0 2  0.0  1.0 3  1.0  0.0 4  0.0  0.0 5  0.0  1.0 6  1.0  0.0

121

answered Oct 13 '22 20:10

T.C. Proctor

There are a number of ways of doing so.

Possibly the simplest is replacing one of the values by None before calling get_dummies. Say you have:

import pandas as pd import numpy as np s = pd.Series(list('babca')) >> s 0    b 1    a 2    b 3    c 4    a

Then use:

>> pd.get_dummies(np.where(s == s.unique()[0], None, s))     a   c 0   0   0 1   1   0 2   0   0 3   0   1 4   1   0

to drop b.

(Of course, you need to consider if your category column doesn't already contain None.)

Another way is to use the prefix argument to get_dummies:

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False)

prefix: string, list of strings, or dict of strings, default None - String to append DataFrame column names Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternativly, prefix can be a dictionary mapping column names to prefixes.

This will append some prefix to all of the resulting columns, and you can then erase one of the columns with this prefix (just make it unique).

answered Oct 13 '22 19:10

Ami Tavory

Related questions
                            
                                Trying to have a grid of card with angular material
                            
                                how to make <svg> 100% width
                            
                                iOS9 App has black bars on top and bottom
                            
                                HTML5 video how to play two videos in one video element
                            
                                How do I fix the directory not found for option -F error [duplicate]
                            
                                UI Testing Xcode 7- can't access element within subview
                            
                                Add time duration to C++ timepoint
                            
                                Android: Skip Gradle "testClasses" task for a dependency project
                            
                                libstdc++: DSO missing from command line
                            
                                How to inject Grails services into src/groovy classes
                            
                                Powershell loop through folders, create file in each folder
                            
                                Can I cast nullptr to other pointer type?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With