I have a Pandas
DataFrame
that looks like this:
MemberID A B C D
1 0.3 0.5 0.1 0
2 0 0.2 0.9 0.3
3 0.4 0.2 0.5 0.3
4 0.1 0 0 0.7
I would like to have another matrix which gives me the number of non-zero elements for the intersection of every column except for MemberID
.
For example, the intersection of columns A
and B
would be 2 (because MemberID
1 and 3 have non-zero values for A
and B
), intersection of A
and C
would be 2 as well (because MemberID
1 and 3 have non-zero values for A
and C
).
The final matrix would look like this:
A B C D
A 3 2 2 2
B 2 3 3 2
C 2 3 3 2
D 2 2 2 3
As we can see, it should be a symmetric matrix, similar to a correlation matrix, but not the correlation matrix.
Intersection of any 2 columns = # of MemberID
having non-zero values in both columns.
I would show some initial code here but I feel like there would be a simple function to do this task that I don't know of.
Here's the code to create the DataFrame
:
df = pd.DataFrame([[0.3, 0.5, 0.1, 0],
[0, 0.2, 0.9, 0.3],
[ 0.4, 0.2, 0.5, 0.3],
[ 0.1, 0, 0, 0.7]],
columns=list('ABCD'))
Any pointers would be appreciated. TIA.
This should to it:
z = (df != 0) * 1
z.T.dot(z)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With