Pandas: How to merge rows based on alternate column values?

Question

I've a dataframe and it has some similar rows

e.g: df:

Dist                Id         ID2         ID3      Values
1.309511252         1       4950005568  4865005556   3
0.239604736         2       13077506433 13062506433  4
0.239604736         2       13062506433 13077506433  4
0.230578014         3       4990001482  4880017235   4
0.230578014         3       4880017235  4990001482   4
0.199825732         4       5065006006  4950005965   5
0.199825732         4       4950005965  5065006006   5

As you can see row numbers 2 & 3, 4 & 5 and 6 & 7 have similar values, just columns(ID2 and ID3) interchanged.

I want to remove those duplicates rows but keep which are single one(in this case row number 1)

I want output as:

Dist                Id         ID2         ID3          Values
1.309511252         1       4950005568  4865005556      3
0.239604736         2       13062506433 13077506433     4   
0.230578014         3       4880017235  4990001482      4
0.199825732         4       4950005965  5065006006      5

Mayank Porwal · Accepted Answer

You can simply groupby and pick the last row from every group using tail.

In [831]: df = df.groupby('Id').tail(1).reset_index()

In [832]: df
Out[832]: 
       Dist  Id          ID2          ID3  Values
0  1.309511   1   4950005568   4865005556       3
1  0.239605   2  13062506433  13077506433       4
2  0.230578   3   4880017235   4990001482       4
3  0.199826   4   4950005965   5065006006       5

Pandas: How to merge rows based on alternate column values?

Tags:

python

python-3.x

pandas

dataframe

L Lawliet

1 Answers

Mayank Porwal

Recent Activity

Donate For Us

Pandas: How to merge rows based on alternate column values?

Tags:

python

python-3.x

pandas

dataframe

L Lawliet

1 Answers

Mayank Porwal

Related questions

Recent Activity

Donate For Us