Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drop duplicate pair data in different columns

below is my data table, from my code output:

| columnA|ColumnB|ColumnC|
| ------ | ----- | ------|
|   12   | 8     | 1.34  |
|   8    | 12    | 1.34  |
|   1    | 7     | 0.25  |

I want to dedupe and only left

| columnA|ColumnB|ColumnC|
| ------ | ----- | ------|
|   12   | 8     | 1.34  |
|   1    | 7     | 0.25  |

Usually when I try to drop duplicate, I am using .drop_duplicates(subset=). But this time, I want to drop same pair,Ex:I want to drop (columnA,columnB)==(columnB,columnA). I do some research, I find someone uses set((a,b) if a<=b else (b,a) for a,b in pairs) to remove the same list pair. But I don't know how to use this method on my pandas data frame. Please help, and thank you in advance!

like image 924
Shawn11 Avatar asked Sep 16 '25 15:09

Shawn11


2 Answers

Convert relevant columns to frozenset:

out = df[~df[['columnA', 'ColumnB']].apply(frozenset, axis=1).duplicated()]
print(out)

# Output
   columnA  ColumnB  ColumnC
0       12        8     1.34
2        1        7     0.25

Details:

>>> set([8, 12])
{8, 12}

>>> set([12, 8])
{8, 12}
like image 64
Corralien Avatar answered Sep 19 '25 08:09

Corralien


You can combine a and b into a tuple and call drop_duplicates based on the combined columne:

t = df[["a", "b"]].apply(lambda row: tuple(set(row)), axis=1)
df.assign(t=t).drop_duplicates("t").drop(columns="t")
like image 34
Code Different Avatar answered Sep 19 '25 07:09

Code Different