I have generated a co-occurrence matrix by using the Python pandas library, with the following code:
# dfdo is an ordered dictionary with a key called KEY453
df = pd.DataFrame(dfdo).set_index('KEY453')
df_asint = df.astype(int)
com = df_asint.T.dot(df_asint)
It follows the same procedure as this question.
My question is, how can I find the top 2 strings which co-occur with a given string in the matrix? For example, The top 2 strings that co-occur with Dog in the example below are Cat and Zebra.
Cat Dog Zebra
Cat 0 2 3
Dog 2 0 1
Zebra 3 1 0
I think you can use nlargest:
print (df.loc['Dog'].nlargest(2))
Cat 2
Zebra 1
Name: Dog, dtype: int64
print (df.loc['Dog'].nlargest(2).index)
Index(['Cat', 'Zebra'], dtype='object')
If need all values of DataFrame use numpy.argsort:
print (np.argsort(-df.values, axis=1)[:, :2])
[[2 1]
[0 2]
[0 1]]
print (df.columns[np.argsort(-df.values, axis=1)[:, :2]])
Index([['Zebra', 'Dog'], ['Cat', 'Zebra'], ['Cat', 'Dog']], dtype='object')
print (pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :2]],
index=df.index,
columns=['first','second']))
first second
Cat Zebra Dog
Dog Cat Zebra
Zebra Cat Dog
or apply:
print (df.apply(lambda x: pd.Series(x.nlargest(2).index, index=['first','second']), axis=1))
first second
Cat Zebra Dog
Dog Cat Zebra
Zebra Cat Dog
option 1
stack then nlargest
df.stack().nlargest(1)
Cat Zebra 3
dtype: int64
option 2
stack then idxmax
df.stack().idxmax()
('Cat', 'Zebra')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With