Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique combinations regardless of column order

I have two columns of categories, with the same possible options showing up in both columns, and I'm looking to count the number of rows per unique combination, regardless of order of the categories in each row (column A or column B).

Example dataframe:

df1 = pd.DataFrame({'A':['dog','cat','mouse','bunny','cat','mouse','dog'],
                   'B':['cat','dog','dog','mouse','bunny','bunny','cat']})

Which would look like:

----------------------------
index         A        B
0           dog       cat
1           cat       dog
2           mouse     dog
3           bunny     mouse
4           cat       bunny
5           mouse     bunny
6           dog       cat
-----------------------------

And I'd like to result to look like:

--------------------------------------
index         A        B        count
0           dog       cat       3
1           bunny     mouse     2
2           mouse     dog       1
3           cat       bunny     1
--------------------------------------

I can get counts of combinations where dog/cat is counted separately from cat/dog but I'm struggling with how to get counts regardless of which category is first.

like image 680
Mark Avatar asked Oct 16 '25 02:10

Mark


1 Answers

Another solution, using .groupby:

x = (
    df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
    .agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
    .reset_index(drop=True)
)
print(x)

Prints:

       A      B  count
0    cat  bunny      1
1  bunny  mouse      2
2    dog    cat      3
3  mouse    dog      1
like image 130
Andrej Kesely Avatar answered Oct 18 '25 19:10

Andrej Kesely



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!