Find symmetric pairs quickly in numpy

Question

from itertools import product
import pandas as pd

df = pd.DataFrame.from_records(product(range(10), range(10)))
df = df.sample(90)
df.columns = "c1 c2".split()
df = df.sort_values(df.columns.tolist()).reset_index(drop=True)
#     c1  c2
# 0    0   0
# 1    0   1
# 2    0   2
# 3    0   3
# 4    0   4
# ..  ..  ..
# 85   9   4
# 86   9   5
# 87   9   7
# 88   9   8
# 89   9   9
# 
# [90 rows x 2 columns]

How do I quickly find, identify, and remove the last duplicate of all symmetric pairs in this data frame?

An example of symmetric pair is that '(0, 1)' is equal to '(1, 0)'. The latter should be removed.

The algorithm must be fast, so it is recommended to use numpy. Converting to python object is not allowed.

Quang Hoang · Accepted Answer

You can sort the values, then groupby:

a= np.sort(df.to_numpy(), axis=1)
df.groupby([a[:,0], a[:,1]], as_index=False, sort=False).first()

Option 2: If you have a lot of pairs c1, c2, groupby can be slow. In that case, we can assign new values and filter by drop_duplicates:

a= np.sort(df.to_numpy(), axis=1) 

(df.assign(one=a[:,0], two=a[:,1])   # one and two can be changed
   .drop_duplicates(['one','two'])   # taken from above
   .reindex(df.columns, axis=1)
)

Find symmetric pairs quickly in numpy

Tags:

python

pandas

numpy

The Unfun Cat

Video Answer

1 Answers

Quang Hoang

Recent Activity

Donate For Us

Find symmetric pairs quickly in numpy

Tags:

python

pandas

numpy

The Unfun Cat

Video Answer

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us