Pandas get_dummies on multiple columns

Tags:

I have a dataset with multiple columns that I wish to one hot encode. However, I don't want to have the encoding for each one of them since said columns are related to the said items. What I want is one "set" of dummies variables that uses all the columns. See my code for a better explanation.

Suppose my dataframe looks like this:

In [103]: dum = pd.DataFrame({'ch1': ['A', 'C', 'A'], 'ch2': ['B', 'G', 'F'], 'ch3': ['C', 'D', 'E']})

In [104]: dum
Out[104]:
 ch1 ch2 ch3
0   A   B   C
1   C   G   D
2   A   F   E

If I execute

pd.get_dummies(dum)

The output will be

   ch1_A  ch1_C  ch2_B  ch2_F  ch2_G  ch3_C  ch3_D  ch3_E
 0      1      0      1      0      0      1      0      0
 1      0      1      0      0      1      0      1      0
 2      1      0      0      1      0      0      0      1

However, what I would like to obtain is something like this:

 A B C D E F G
 1 1 1 0 0 0 0
 0 0 1 1 0 0 1
 1 0 0 0 1 1 0

Instead of having multiple columns representing the encoding, e.g. ch1_A and ch1_C, I only wish to have one group (A, B, and so on) with value 1 when any of the values in the columns ch1, ch2, ch3 show up.

To clarify, in my original dataset, a single row won't contain the same value (A,B,C...) more than once; it will just appear on one of the columns.

595

asked Aug 26 '18 17:08

user3276768

2 Answers

Using stack and str.get_dummies

dum.stack().str.get_dummies().sum(level=0)
Out[938]: 
   A  B  C  D  E  F  G
0  1  1  1  0  0  0  0
1  0  0  1  1  0  0  1
2  1  0  0  0  1  1  0

172

answered Nov 06 '22 14:11

BENY

You could use pd.crosstab to create a frequency table:

import pandas as pd

dum = pd.DataFrame({'ch1': ['A', 'C', 'A'], 'ch2': ['B', 'G', 'F'], 'ch3': ['C', 'D', 'E']})

stacked = dum.stack()
index = stacked.index.get_level_values(0)
result = pd.crosstab(index=index, columns=stacked)
result.index.name = None
result.columns.name = None

print(result)

yields

   A  B  C  D  E  F  G
0  1  1  1  0  0  0  0
1  0  0  1  1  0  0  1
2  1  0  0  0  1  1  0

answered Nov 06 '22 14:11

unutbu

Related questions
                            
                                Why is my custom JSONEncoder.default() ignoring booleans?
                            
                                How can I pass fixtures to pytest.mark.parameterize?
                            
                                Check if a node with the same label exists in networkx in python
                            
                                Pytest with argparse: how to test user is prompted for confirmation?
                            
                                Conda cannot remove environment called "tensorflow"
                            
                                Keras: Expected 3 dimensions, but got array with shape - dense model
                            
                                how to update/delete rows in Bigquery from the python api?
                            
                                What is the best way to Install Conda on MacOS (Apple/Mac)?
                            
                                How to dump a collection to json file using pymongo
                            
                                Pandas. How to read Excel file from ZIP archive
                            
                                How do i create a protobuf3 Timestamp in python?
                            
                                Heroku ---> Installing pip remote: AttributeError: module 'pip._vendor.requests' has no attribute 'Session'
                            
                                I get an error when return a queryset objects: Cannot resolve expression type, unknown output_field
                            
                                Converting pandas data frame with degree minute second (DMS) coordinates to decimal degrees
                            
                                Background color when cropping image with PIL
                            
                                How to use Paramiko getfo to download file from SFTP server to memory to process it
                            
                                Python Dash: Custom CSS
                            
                                How do I select only a specific digit from the MNIST dataset provided by Keras?
                            
                                No module named 'termcolor'
                            
                                How to draw bounding box on best matches?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas get_dummies on multiple columns

Tags:

python

pandas

user3276768

People also ask

2 Answers

BENY

unutbu

Recent Activity

Donate For Us