Python - calculate the co-occurrence matrix

Tags:

I'm working on an NLP task and I need to calculate the co-occurrence matrix over documents. The basic formulation is as below:

Here I have a matrix with shape (n, length), where each row represents a sentence composed by length words. So there are n sentences with same length in all. Then with a defined context size, e.g., window_size = 5, I want to calculate the co-occurrence matrix D, where the entry in the cth row and wth column is #(w,c), which means the number of times that a context word c appears in w's context.

An example can be referred here. How to calculate the co-occurrence between two words in a window of text?

I know it can be calculate by stacking loops, but I want to know if there exits an simple way or simple function? I have find some answers but they cannot work with a window sliding through the sentence. For example:word-word co-occurrence matrix

So could anyone tell me is there any function in Python can deal with this problem concisely? Cause I think this task is quite common in NLP things.

969

asked Jan 15 '17 13:01

GEORGE GUO

1 Answers

It is not that complicated, I think. Why not make a function for yourself? First get the co-occurrence matrix X according to this tutorial: http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage Then for each sentence, calculate the co-occurrence and add them to a summary variable.

m = np.zeros([length,length]) # n is the count of all words
def cal_occ(sentence,m):
    for i,word in enumerate(sentence):
        for j in range(max(i-window,0),min(i+window,length)):
             m[word,sentence[j]]+=1
for sentence in X:
    cal_occ(sentence, m)

100

answered Sep 23 '22 16:09

Zealseeker

Related questions
                            
                                Per-class @property decorator in Python
                            
                                Printing a double group by pandas dataframe as a 2D array
                            
                                Django: Query self referencing objects with no child elements
                            
                                python lowest cost of checking various equalities at once
                            
                                Preventing a Python For-loop from iterating over a single string by char
                            
                                Pyramid with SQLAlchemy: scoped or non-scoped database session
                            
                                sort numpy array elements by the value of a condition on the elements
                            
                                Sympy computing the inverse laplace transform
                            
                                Pythonic way to return a boolean value and a message [duplicate]
                            
                                python find repeated substring in string [closed]
                            
                                Convert PySpark dataframe column type to string and replace the square brackets
                            
                                [matplotlib]: understanding "set_ydata" method
                            
                                Can I use np.resize to pad an array with np.nan
                            
                                TypeError: <Response 36 bytes [200 OK]> is not JSON serializable
                            
                                Converting unicode string to hexadecimal representation
                            
                                python hug api return custom http code
                            
                                Python Win 3.6.0 x64 issue, missing qt designer exe after pip3 install pyqt5
                            
                                How to rewrite this Flask view function to follow the post/redirect/get pattern?
                            
                                How can I move the text label of a radiobutton below the button in Python Tkinter?
                            
                                Sklearn.KMeans : how to avoid Memory or Value Error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - calculate the co-occurrence matrix

Tags:

python

machine-learning

matrix

nlp

GEORGE GUO

People also ask

1 Answers

Zealseeker

Recent Activity

Donate For Us