Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to merge rows based on alternate column values?

I've a dataframe and it has some similar rows

e.g: df:

Dist                Id         ID2         ID3      Values
1.309511252         1       4950005568  4865005556   3
0.239604736         2       13077506433 13062506433  4
0.239604736         2       13062506433 13077506433  4
0.230578014         3       4990001482  4880017235   4
0.230578014         3       4880017235  4990001482   4
0.199825732         4       5065006006  4950005965   5
0.199825732         4       4950005965  5065006006   5

As you can see row numbers 2 & 3, 4 & 5 and 6 & 7 have similar values, just columns(ID2 and ID3) interchanged.

I want to remove those duplicates rows but keep which are single one(in this case row number 1)

I want output as:

Dist                Id         ID2         ID3          Values
1.309511252         1       4950005568  4865005556      3
0.239604736         2       13062506433 13077506433     4   
0.230578014         3       4880017235  4990001482      4
0.199825732         4       4950005965  5065006006      5
like image 278
L Lawliet Avatar asked Dec 09 '25 01:12

L Lawliet


1 Answers

You can simply groupby and pick the last row from every group using tail.

In [831]: df = df.groupby('Id').tail(1).reset_index()

In [832]: df
Out[832]: 
       Dist  Id          ID2          ID3  Values
0  1.309511   1   4950005568   4865005556       3
1  0.239605   2  13062506433  13077506433       4
2  0.230578   3   4880017235   4990001482       4
3  0.199826   4   4950005965   5065006006       5
like image 73
Mayank Porwal Avatar answered Dec 11 '25 14:12

Mayank Porwal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!