Create dummies from column with multiple values in pandas

Tags:

I am looking for for a pythonic way to handle the following problem.

The pandas.get_dummies() method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B'], get_dummies() creates 2 dummy variables and assigns 0 or 1 accordingly.

Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D'] . get_dummies() creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks

Edited, hope it is more clear!

298

asked Sep 19 '13 08:09

mkln

1 Answers

I know it's been a while since this question was asked, but there is (at least now there is) a one-liner that is supported by the documentation:

In [4]: df Out[4]:       label 0  (a, c, e) 1     (a, d) 2       (b,) 3     (d, e)  In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*') Out[5]:    a  b  c  d  e 0  1  0  1  0  1 1  1  0  0  1  0 2  0  1  0  0  0 3  0  0  0  1  1

answered Oct 02 '22 19:10

offbyone

Related questions
                            
                                Access Django models with scrapy: defining path to Django project
                            
                                Python generating a list of dates between two dates
                            
                                How should we test exceptions with nose?
                            
                                Prettyprint to a file?
                            
                                Pandas max value index
                            
                                How to set the default of a JSONField to empty list in Django and django-jsonfield?
                            
                                Check if value from one dataframe exists in another dataframe
                            
                                What is the deal with the pony in Python community? [closed]
                            
                                Is it possible to plot implicit equations using Matplotlib?
                            
                                Is there a way to uninstall multiple packages with pip?
                            
                                How to make PyQt window state to maximised in pyqt
                            
                                About refreshing objects in sqlalchemy session
                            
                                SQLAlchemy delete doesn't cascade
                            
                                Python sockets error TypeError: a bytes-like object is required, not 'str' with send function
                            
                                Is there Django List View model sort?
                            
                                matplotlib: change title and colorbar text and tick colors
                            
                                parsing a tab-separated file in Python
                            
                                Python: Start new command prompt on Windows and wait for it finish/exit
                            
                                Why can't I set a global variable in Python?
                            
                                Python 3.2 - cookielib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create dummies from column with multiple values in pandas

Tags:

python

pandas

categorical-data

dummy-data

mkln

People also ask

1 Answers

offbyone

Recent Activity

Donate For Us