removing redundant columns when using get_dummies

Tags:

Hi have a pandas dataframe df containing categorical variables.

df=pandas.DataFrame(data=[['male','blue'],['female','brown'],
['male','black']],columns=['gender','eyes'])

df
Out[16]: 
   gender   eyes
0    male   blue
1  female  brown
2    male  black

using the function get_dummies I get the following dataframe

df_dummies = pandas.get_dummies(df)

df_dummies
Out[18]: 
   gender_female  gender_male  eyes_black  eyes_blue  eyes_brown
0              0            1           0          1           0
1              1            0           0          0           1
2              0            1           1          0           0

Owever the columns gender_female and gender_male contain the same information because the original column could assume a binary value. Is there a (smart) way to keep only one of the 2 columns?

UPDATED

The use of

df_dummies = pandas.get_dummies(df,drop_first=True)

Would give me

df_dummies
Out[21]: 
   gender_male  eyes_blue  eyes_brown
0            1          1           0
1            0          0           1
2            1          0           0

but I would like to remove the columns for which originally I had only 2 possibilities

The desired result should be

df_dummies
Out[18]: 
   gender_male  eyes_black  eyes_blue  eyes_brown
0  1           0          1           0
1  0           0          0           1
2  1           1          0           0

559

asked May 04 '18 13:05

gabboshow

1 Answers

Alternatively, you can split the dataframe into parts you want to apply drop_first=True and parts you don't. Then concatenate them together.

df1 = df.iloc[:, 0:2]
df2 = df.iloc[:, 2:]
df1 = pd.get_dummies(df1 ,drop_first=True)

df = pd.concat([df1, df2], axis=1)

answered Sep 29 '22 11:09

David LE

Related questions
                            
                                How do I have a "press enter to continue" feature in python? [duplicate]
                            
                                sqlalchemy print results instead of objects
                            
                                pip install mod_wsgi, How to Set MOD_WSGI_APACHE_ROOTDIR environment?
                            
                                ImportError: No module named googleapiclient.discovery
                            
                                How does paging work in the list_blobs function in Google Cloud Storage Python Client Library
                            
                                Is LASSO regression implemented in Statsmodels?
                            
                                Import CSV to database using sqlalchemy
                            
                                In method call args, how to override keyword argument of unpacked dict?
                            
                                mypy: how to define a generic subclass
                            
                                LSTM: Understand timesteps, samples and features and especially the use in reshape and input_shape
                            
                                Set values based on df.query?
                            
                                What is the necessity of sys.exit(app.exec_()) in PyQt?
                            
                                Bin elements per row - Vectorized 2D Bincount for NumPy
                            
                                Real-time audio signal processing using python
                            
                                sklearn kfold returning wrong indexes in python
                            
                                Why is a compiled python regex slower?
                            
                                pandas multiply using dictionary values across several columns
                            
                                How to get all noun phrases in Spacy
                            
                                Differences between Numpy divide and Python divide?
                            
                                Kivy: scroll to zoom

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

removing redundant columns when using get_dummies

Tags:

python

pandas

categorical-data

gabboshow

People also ask

1 Answers

David LE

Recent Activity

Donate For Us