Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to compute co-occurrences in Pandas

Tags:

python

pandas

I have a dataframe with boolean columns each indicating whether a record belongs to a category:

import pandas as pd

example = pd.DataFrame({
    "is_a": [True, False, True, True],
    "is_b": [False, False, False, True],
    "is_c": [True, False, False, True],
})

example:

    is_a    is_b    is_c
0   True    False   True
1   False   False   False
2   True    False   False
3   True    True    True

I want to count the number of co-occurrences between each pair of categories. I'm currently doing this:

cols = ["is_a", "is_b", "is_c"]
output = pd.DataFrame(
    {x: [(example[x] & example[y]).sum() for y in cols] for x in cols},
    index=cols,
)

output:

     is_a is_b is_c
is_a    3    1    2
is_b    1    1    1
is_c    2    1    2

This gives me the right output, but I'm wondering if anyone thinks they've found a better solution for this problem.

like image 263
Edgar Ramírez Mondragón Avatar asked Sep 02 '25 15:09

Edgar Ramírez Mondragón


1 Answers

dot

This is the Pandas method pandas.DataFrame.dot method using the @ operator.

(lambda d: d.T @ d)(example.astype(int))

      is_a  is_b  is_c
is_a     3     1     2
is_b     1     1     1
is_c     2     1     2

Same thing but using ndarray instead

a = example.to_numpy().astype(int)
pd.DataFrame(a.T @ a, example.columns, example.columns)

      is_a  is_b  is_c
is_a     3     1     2
is_b     1     1     1
is_c     2     1     2
like image 153
piRSquared Avatar answered Sep 07 '25 23:09

piRSquared