below is my data table, from my code output:
| columnA|ColumnB|ColumnC|
| ------ | ----- | ------|
| 12 | 8 | 1.34 |
| 8 | 12 | 1.34 |
| 1 | 7 | 0.25 |
I want to dedupe and only left
| columnA|ColumnB|ColumnC|
| ------ | ----- | ------|
| 12 | 8 | 1.34 |
| 1 | 7 | 0.25 |
Usually when I try to drop duplicate, I am using .drop_duplicates(subset=)
. But this time, I want to drop same pair,Ex:I want to drop (columnA,columnB)==(columnB,columnA)
. I do some research, I find someone uses set((a,b) if a<=b else (b,a) for a,b in pairs)
to remove the same list pair. But I don't know how to use this method on my pandas data frame. Please help, and thank you in advance!
Convert relevant columns to frozenset
:
out = df[~df[['columnA', 'ColumnB']].apply(frozenset, axis=1).duplicated()]
print(out)
# Output
columnA ColumnB ColumnC
0 12 8 1.34
2 1 7 0.25
Details:
>>> set([8, 12])
{8, 12}
>>> set([12, 8])
{8, 12}
You can combine a
and b
into a tuple and call drop_duplicates
based on the combined columne:
t = df[["a", "b"]].apply(lambda row: tuple(set(row)), axis=1)
df.assign(t=t).drop_duplicates("t").drop(columns="t")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With