how do I remove rows with duplicate values of columns in pandas data frame?

Tags:

python

pandas

I have a pandas data frame which looks like this.

  Column1  Column2 Column3 0     cat        1       C 1     dog        1       A 2     cat        1       B

I want to identify that cat and bat are same values which have been repeated and hence want to remove one record and preserve only the first record. The resulting data frame should only have.

  Column1  Column2 Column3 0     cat        1       C 1     dog        1       A

374

asked Jun 16 '18 04:06

Sayonti

2 Answers

Using drop_duplicates with subset with list of columns to check for duplicates on and keep='first' to keep first of duplicates.

If dataframe is:

df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],                    'Column2': ["'bat'", "'flower'", "'bat'"],                    'Column3': ["'xyz'", "'abc'", "'lmn'"]}) print(df)

Result:

  Column1   Column2 Column3 0   'cat'     'bat'   'xyz' 1   'toy'  'flower'   'abc' 2   'cat'     'bat'   'lmn'

Then:

result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first') print(result_df)

Result:

  Column1   Column2 Column3 0   'cat'     'bat'   'xyz' 1   'toy'  'flower'   'abc'

163

answered Sep 23 '22 02:09

student

import pandas as pd  df = pd.DataFrame({"Column1":["cat", "dog", "cat"],                     "Column2":[1,1,1],                     "Column3":["C","A","B"]})  df = df.drop_duplicates(subset=['Column1'], keep='first') print(df)

answered Sep 23 '22 02:09

zafrin

Related questions
                            
                                Correlation between columns in DataFrame
                            
                                Django humanize outside of template?
                            
                                Turn Off Autosave in IPython Notebook
                            
                                How do I install python3-gi within virtualenv?
                            
                                Django JSONField filtering
                            
                                How do I read a parquet in PySpark written from Spark?
                            
                                Python convert seconds to datetime date and time [duplicate]
                            
                                cmake error 'the source does not appear to contain CMakeLists.txt'
                            
                                'in-place' string modifications in Python
                            
                                Check if module exists, if not install it
                            
                                Getting SQLAlchemy to issue CREATE SCHEMA on create_all
                            
                                ImportError: No module named xgboost
                            
                                Purpose of `numpy.log1p( )`?
                            
                                Python | change text color in shell [duplicate]
                            
                                Python print unicode strings in arrays as characters, not code points
                            
                                How to programmatically make a horizontal line in Qt
                            
                                Assigning to variable from parent function: "Local variable referenced before assignment" [duplicate]
                            
                                django submit two different forms with one submit button
                            
                                How to save in *.xlsx long URL in cell using Pandas
                            
                                How to delete an object from a numpy array without knowing the index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With