I have dataset, the dataset have pairing duplication. Here's my data
Id antecedent descendant
1 one two
2 two one
3 two three
4 one three
5 three two
Here's what I need, because one, two
is equals two, one
so I want ro remove the duplicate pair
Id antecedent descendant
1 one two
3 two three
4 one three
Use numpy.sort
for sort per rows with duplicated
for boolean mask:
df1 = pd.DataFrame(np.sort(df[['antecedent','descendant']], axis=1))
Or:
#slowier solution
#df1 = df[['antecedent','descendant']].apply(frozenset, 1)
df = df[~df1.duplicated()]
print (df)
Id antecedent descendant
0 1 one two
2 3 two three
3 4 one three
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With