Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute co-occurrence matrix by counting values in cells

I have a dataframe like this

df = pd.DataFrame({'a' : [1,1,0,0], 'b': [0,1,1,0], 'c': [0,0,1,1]})

I want to get

  a b c
a 2 1 0
b 1 2 1
c 0 1 2

where a,b,c are column names, and I get the values counting '1' in all columns when the filter is '1' in another column. For ample, when df.a == 1, we count a = 2, b =1, c = 0 etc

I made a loop to solve

matrix = []
for name, values in df.iteritems():
    matrix.append(pd.DataFrame( df.groupby(name, as_index=False).apply(lambda x: x[x == 1].count())).values.tolist()[1])
pd.DataFrame(matrix)

But I think that there is a simpler solution, isn't it?

like image 210
Edward Avatar asked May 10 '18 18:05

Edward


2 Answers

You appear to want the matrix product, so leverage DataFrame.dot:

df.T.dot(df)
   a  b  c
a  2  1  0
b  1  2  1
c  0  1  2

Alternatively, if you want the same level of performance without the overhead of pandas, you could compute the product with np.dot:

v = df.values
pd.DataFrame(v.T.dot(v), index=df.columns, columns=df.columns)

Or, if you want to get cute,

(lambda a, c: pd.DataFrame(a.T.dot(a), c, c))(df.values, df.columns)

   a  b  c
a  2  1  0
b  1  2  1
c  0  1  2

—piRSquared

like image 117
cs95 Avatar answered Sep 28 '22 04:09

cs95


np.einsum

Not as pretty as df.T.dot(df) but how often do you see np.einsum amirite?

pd.DataFrame(np.einsum('ij,ik->jk', df, df), df.columns, df.columns)

   a  b  c
a  2  1  0
b  1  2  1
c  0  1  2
like image 27
piRSquared Avatar answered Sep 28 '22 05:09

piRSquared