Dropping duplicates in Pandas excluding one column

Tags:

python

pandas

This seems simple, but I can not find any information on it on the internet.

I have a dataframe like below:

City    State Zip           Date        Description        Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices Earlham IA    50072-1036    2014-10-10  Compliance: Devices Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

How can I eliminate duplicates that match 4 of 5 columns? The column not matching being Description.

The result would be

City    State Zip           Date        Description        Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

I found online that drop_duplicates with the subset parameter could work, but I am unsure of how I can apply it to multiple columns.

272

asked Jul 18 '16 20:07

Jstuff

1 Answers

You've actually found the solution. For multiple columns, subset will be a list.

df.drop_duplicates(subset=['City', 'State', 'Zip', 'Date'])

Or, just by stating the column to be ignored:

df.drop_duplicates(subset=df.columns.difference(['Description']))

155

answered Sep 21 '22 02:09

ayhan

Related questions
                            
                                Get window position & size with python
                            
                                Is it possible to dereference variable id's?
                            
                                Travis special requirements for each python version
                            
                                sqlalchemy: create relations but without foreign key constraint in db?
                            
                                Python - Drop row if two columns are NaN
                            
                                returning numpy arrays via pybind11
                            
                                pip3 on python3.9 fails on 'HTMLParser' object has no attribute 'unescape' [duplicate]
                            
                                Listing serial (COM) ports on Windows?
                            
                                select from sqlite table where rowid in list using python sqlite3 — DB-API 2.0
                            
                                Crawling with an authenticated session in Scrapy
                            
                                Count occurrences of each of certain words in pandas dataframe
                            
                                What does the "r" in pythons re.compile(r' pattern flags') mean?
                            
                                detecting idle time using python
                            
                                How to call an element in a numpy array?
                            
                                Access django models inside of Scrapy
                            
                                How to get a complete exception stack trace in Python
                            
                                Terminate multiple threads when any thread completes a task
                            
                                casting ints to str in Jinja2
                            
                                How I can load a font file with PIL.ImageFont.truetype without specifying the absolute path?
                            
                                Selenium Element not visible exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With