Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop duplicates of one column based on value in another column, Python, Pandas

I have a dataframe like this:

Date                PlumeO      Distance
2014-08-13 13:48:00  754.447905 5.844577 
2014-08-13 13:48:00  754.447905 6.888653
2014-08-13 13:48:00  754.447905 6.938860
2014-08-13 13:48:00  754.447905 6.977284
2014-08-13 13:48:00  754.447905 6.946430 
2014-08-13 13:48:00  754.447905 6.345506
2014-08-13 13:48:00  754.447905 6.133567
2014-08-13 13:48:00  754.447905 5.846046 
2014-08-13 16:59:00  754.447905 6.345506 
2014-08-13 16:59:00  754.447905 6.694847 
2014-08-13 16:59:00  754.447905 5.846046 
2014-08-13 16:59:00  754.447905 6.977284 
2014-08-13 16:59:00  754.447905 6.938860 
2014-08-13 16:59:00  754.447905 5.844577 
2014-08-13 16:59:00  754.447905 6.888653 
2014-08-13 16:59:00  754.447905 6.133567 
2014-08-13 16:59:00  754.447905 6.946430

I'm trying to keep the date with the smallest distance, so drop the duplicates dates and keep the with the smallest distance.

Is there a way to achieve this in pandas' df.drop_duplicates or am I stuck using if statements to find the smallest distance?

like image 514
Ahmed Avatar asked Jul 12 '17 13:07

Ahmed


People also ask

Do Python pandas drop duplicates?

If you are new to Python pandas check out an article on, Pandas in Python. 1. Python pandas drop duplicates 2. Python Pandas drop duplicates based on column 3. Python pandas drop duplicates keep last 4. Pandas drop duplicates multiple columns 5. Python pandas drop duplicates subset 6. Python pandas drop duplicates index 7.

How to drop duplicate rows based on a specific column in Python?

In this article, we are going to drop the duplicate rows based on a specific column from dataframe using pyspark in Python. Duplicate data means the same data based on some condition (column values). For this, we are using dropDuplicates () method: Syntax: dataframe.dropDuplicates ( [‘column 1′,’column 2′,’column n’]).show ()

How to remove duplicate columns from a Dataframe?

Approach: 1 We will drop duplicate columns based on two columns 2 Let those columns be ‘order_id’ and ‘customer_id’ 3 Keep the latest entry only 4 Reset the index of dataframe More ...

How do I find duplicate rows in a list in MySQL?

drop_duplicates (self, subset=None, keep= "first", inplace= False) subset: Subset takes a column or list of column label for identifying duplicate rows. By default, all the columns are used to find the duplicate rows. keep: allowed values are {‘first’, ‘last’, False}, default ‘first’.


1 Answers

Sort by distances and drop by dates:

df.sort_values('Distance').drop_duplicates(subset='Date', keep='first')
Out: 
                   Date      PlumeO  Distance
0   2014-08-13 13:48:00  754.447905  5.844577
13  2014-08-13 16:59:00  754.447905  5.844577
like image 50
ayhan Avatar answered Nov 15 '22 22:11

ayhan