Python - Drop duplicate based on max value of a column

Tags:

I am not really good with pandas, and I think pandas should solve my problem: I have a text file, that contains data (id1;id2;value1;value2;value3)

1;2;30;40;20.3;
1;2;30;42;26.2;
3;5;12;55;10.7;
3;5;12;23;8.7;
3;5;12;33;11.2;
24;12;1;553;1.1;
24;12;1;23;1.9;

As a result, I want to keep lines, that have equal id1, id2, value1, and higher value3. Value2 is not important, but it needs to be kept, e.g.

1;2;30;42;26.2;
3;5;12;33;11.2;
24;12;1;23;1.9;

820

asked Feb 16 '17 07:02

krizz

1 Answers

You need DataFrameGroupBy.idxmax for indexes of max value of value3 and thes select DataFrame by loc:

print (df.groupby(['id1','id2','value1']).value3.idxmax())
id1  id2  value1
1    2    30        1
3    5    12        4
24   12   1         6
Name: value3, dtype: int64

df = df.loc[df.groupby(['id1','id2','value1']).value3.idxmax()]
print (df)
   id1  id2  value1  value2  value3   a
1    1    2      30      42    26.2 NaN
4    3    5      12      33    11.2 NaN
6   24   12       1      23     1.9 NaN

Another possible solution is sort_values by column value3 and then groupby with GroupBy.first:

df = df.sort_values('value3', ascending=False)
       .groupby(['id1','id2','value1'], sort=False)
       .first()
       .reset_index()
print (df)
   id1  id2  value1  value2  value3   a
0    1    2      30      42    26.2 NaN
1    3    5      12      33    11.2 NaN
2   24   12       1      23     1.9 NaN

108

answered Sep 22 '22 11:09

jezrael

Related questions
                            
                                membership test in pandas data frame column
                            
                                top: 50%; not working in Safari
                            
                                How to avoid "Invalid byte sequence" when looking for link with text using Nokogiri
                            
                                Storing a UUID in Cloud Spanner
                            
                                Programmatic access to old and new values of a watchpoint in gdb
                            
                                How to limit pushing operation to allow only commits that are signed with GPG in github
                            
                                NSUrlSession suspend and resume issue on device lock
                            
                                Binary compatibility of changing a class with static methods to interface in Java
                            
                                Route separation in express routing with passing passport instance
                            
                                TypeError: Output tensors to a Model must be Keras tensors
                            
                                EventSource vs EventProvider
                            
                                Android: directly launch the activity/fragment that is under development

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With