Remove duplicate rows in pandas dataframe based on condition

Tags:

python

pandas

            is_avail   valu data_source
2015-08-07     False  0.282    source_a
2015-08-07     False  0.582    source_b
2015-08-23     False  0.296    source_a
2015-09-08     False  0.433    source_a
2015-10-01      True  0.169    source_b

In the dataframe above, I want to remove the duplicate rows (i.e. row where the index is repeated) by retaining the row with a higher value in the valu column.

I can remove rows with duplicate indexes like this:

df = df[~df.index.duplicated()]. But how to remove based on condition specified above?

467

asked May 05 '17 22:05

user308827

2 Answers

You can use groupby on index after sorting the df by valu.

df.sort_values(by='valu', ascending=False).groupby(level=0).first()
Out[1277]: 
           is_avail   valu data_source
2015-08-07    False  0.582    source_b
2015-08-23    False  0.296    source_a
2015-09-08    False  0.433    source_a
2015-10-01     True  0.169    source_b

answered Sep 21 '22 03:09

Allen

Using drop_duplicates with keep='last'

df.rename_axis('date').reset_index() \
    .sort_values(['date', 'valu']) \
    .drop_duplicates('date', keep='last') \
    .set_index('date').rename_axis(df.index.name)

           is_avail   valu data_source
2015-08-07    False  0.582    source_b
2015-08-23    False  0.296    source_a
2015-09-08    False  0.433    source_a
2015-10-01     True  0.169    source_b

answered Sep 21 '22 03:09

piRSquared

Related questions
                            
                                IndexError: tuple index out of range when parsing method arguments
                            
                                keeping track of indices change in numpy.reshape
                            
                                What is row slicing vs What is column slicing?
                            
                                pandas groupby two columns and summarize by mean
                            
                                How to add CSS class to widget/field with Django 1.11 template-based form rendering
                            
                                How to create an infinite iterator to generate an incrementing alphabet pattern?
                            
                                How to list all classification/regression/clustering algorithms in scikit-learn?
                            
                                Pdfkit OSError: No wkhtmltopdf executable found
                            
                                Python & MS Word: Convert .doc to .docx?
                            
                                How to make auto indention in nano while programming in python in linux?
                            
                                python - opencv morphologyEx remove specific color
                            
                                Download all blobs files locally from azure container using python
                            
                                Check if a string defines a color
                            
                                Find EC2 Instances belonging to specific Target Group with Boto3
                            
                                Adding a SearchVectorField to a model in Django
                            
                                Update and append new rows based on index value python
                            
                                Add multiple columns to DataFrame and set them equal to an existing column
                            
                                How can I use curses with PyCharm?
                            
                                Pandas drop rows where column contains *
                            
                                Django rest framework, set the api response Content-Encoding to gzip

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With