Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count of rows where given columns of a DataFrame are non-zero

Tags:

I have a Pandas DataFrame that looks like this:

MemberID    A    B    C    D
1           0.3  0.5 0.1   0
2           0    0.2 0.9   0.3
3           0.4  0.2 0.5   0.3
4           0.1  0   0     0.7

I would like to have another matrix which gives me the number of non-zero elements for the intersection of every column except for MemberID.

For example, the intersection of columns A and B would be 2 (because MemberID 1 and 3 have non-zero values for A and B), intersection of A and C would be 2 as well (because MemberID 1 and 3 have non-zero values for A and C).

The final matrix would look like this:

    A    B    C    D
A   3    2    2    2
B   2    3    3    2
C   2    3    3    2
D   2    2    2    3

As we can see, it should be a symmetric matrix, similar to a correlation matrix, but not the correlation matrix.

Intersection of any 2 columns = # of MemberID having non-zero values in both columns.

I would show some initial code here but I feel like there would be a simple function to do this task that I don't know of.

Here's the code to create the DataFrame:

df = pd.DataFrame([[0.3, 0.5,  0.1, 0],
                   [0,  0.2,  0.9, 0.3],
                   [ 0.4,  0.2,  0.5, 0.3],
                   [ 0.1, 0, 0,  0.7]],
                  columns=list('ABCD'))

Any pointers would be appreciated. TIA.

like image 746
Patthebug Avatar asked Jul 18 '16 21:07

Patthebug


1 Answers

This should to it:

z = (df != 0) * 1
z.T.dot(z)

enter image description here

like image 182
piRSquared Avatar answered Oct 11 '22 09:10

piRSquared