I am not really good with pandas, and I think pandas should solve my problem:
I have a text file, that contains data (id1
;id2
;value1
;value2
;value3
)
1;2;30;40;20.3;
1;2;30;42;26.2;
3;5;12;55;10.7;
3;5;12;23;8.7;
3;5;12;33;11.2;
24;12;1;553;1.1;
24;12;1;23;1.9;
As a result, I want to keep lines, that have equal id1
, id2
, value1
, and higher value3
. Value2
is not important, but it needs to be kept, e.g.
1;2;30;42;26.2;
3;5;12;33;11.2;
24;12;1;23;1.9;
1. If you want to remove all duplicates but leave the highest ones, you can apply this formula =MAX(IF($A$2:$A$12=D2,$B$2:$B$12)), remember to press Shift + Ctrl + Enter keys. 2. In the above formulas, A2:A12 is the original list you need to remove duplicates from.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
Pandas drop_duplicates() Function Syntax keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.
By using pandas. DataFrame. drop_duplicates() method you can drop/remove/delete duplicate rows from DataFrame. Using this method you can drop duplicate rows on selected multiple columns or all columns.
You need DataFrameGroupBy.idxmax
for indexes of max value of value3
and thes select DataFrame
by loc
:
print (df.groupby(['id1','id2','value1']).value3.idxmax())
id1 id2 value1
1 2 30 1
3 5 12 4
24 12 1 6
Name: value3, dtype: int64
df = df.loc[df.groupby(['id1','id2','value1']).value3.idxmax()]
print (df)
id1 id2 value1 value2 value3 a
1 1 2 30 42 26.2 NaN
4 3 5 12 33 11.2 NaN
6 24 12 1 23 1.9 NaN
Another possible solution is sort_values
by column value3
and then groupby
with GroupBy.first
:
df = df.sort_values('value3', ascending=False)
.groupby(['id1','id2','value1'], sort=False)
.first()
.reset_index()
print (df)
id1 id2 value1 value2 value3 a
0 1 2 30 42 26.2 NaN
1 3 5 12 33 11.2 NaN
2 24 12 1 23 1.9 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With