I have a dataframe with boolean columns each indicating whether a record belongs to a category:
import pandas as pd
example = pd.DataFrame({
"is_a": [True, False, True, True],
"is_b": [False, False, False, True],
"is_c": [True, False, False, True],
})
example:
is_a is_b is_c
0 True False True
1 False False False
2 True False False
3 True True True
I want to count the number of co-occurrences between each pair of categories. I'm currently doing this:
cols = ["is_a", "is_b", "is_c"]
output = pd.DataFrame(
{x: [(example[x] & example[y]).sum() for y in cols] for x in cols},
index=cols,
)
output:
is_a is_b is_c
is_a 3 1 2
is_b 1 1 1
is_c 2 1 2
This gives me the right output, but I'm wondering if anyone thinks they've found a better solution for this problem.
dot
This is the Pandas method pandas.DataFrame.dot
method using the @
operator.
(lambda d: d.T @ d)(example.astype(int))
is_a is_b is_c
is_a 3 1 2
is_b 1 1 1
is_c 2 1 2
Same thing but using ndarray
instead
a = example.to_numpy().astype(int)
pd.DataFrame(a.T @ a, example.columns, example.columns)
is_a is_b is_c
is_a 3 1 2
is_b 1 1 1
is_c 2 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With