How to compute similarities based on co-occurrence matrix?

Tags:

I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about these approaches. Any inputs to get started is appreciated.

Use cosine to compute sim between two vectors
Turn this into a graph, use measures like simrank to compute similarity - may use the occurrence count as a weight between two nodes.

764

asked Feb 01 '17 07:02

kitchenprinzessin

3 Answers

I would recommend using spatial cosine similarity. Alternatively you could calculate jaccard's similarity for each item pair.

After calculating either similarity matrix (affinity matrix) you can use a spectral (or spatial) clustering algorithm, such as sklearn's spectral clustering algorithm to group those items.

answered Oct 22 '22 09:10

Nico

You can thread it as 1877 items with 1877 features each. If two items are similar, than they co-occurrences will be similar. Given that you might use NearestNeighbors in order to find closest one. There are may metrics available.

Also, reprocessing the data may help you. I do not know it's distribution but you might want to normalize values into range [0;1] or doing sth like that.

answered Oct 22 '22 09:10

mbednarski

If your co-nonoccurence matrix is symmetrical, you don't need to normalize it. You can refer to this paper for gain more information about normalization of symmetrical and asymmetrical co-matrices: Leydesdorff, L. and Vaughan, L., 2006. Co‐occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the American Society for Information Science and technology, 57(12), pp.1616-1628. please, click hear

answered Oct 22 '22 07:10

Hamed Baziyad

Related questions
                            
                                Error in parsing, update multiple columns in 1 line
                            
                                xarray with masked arrays while preserving integer dtypes
                            
                                How to get the number of rows in a Pandas chunk?
                            
                                pandas dataframe equivalent to R data.table by
                            
                                How to access a subfolder in Outlook inbox in Python
                            
                                how to get ssl certificate details using python
                            
                                Python: Passing self to class methods or arguments
                            
                                Python Brainf*** – Bugs on while loops
                            
                                No handler for type [text] declared on field [title] (python elasticsearch
                            
                                What is gensim's 'docvecs'?
                            
                                Run Ansible-Playbook on localhost on Windows
                            
                                GLM gamma regression in Python statsmodels
                            
                                How to reverse a list without modifying the original list in Python
                            
                                Python Web Socket closes immediately after opening
                            
                                Verbose log abbriviations meaning in SVC, scikit-learn
                            
                                Finding the largest difference in population among multiple counties?
                            
                                How to efficiently calculate the outer product of two series of matrices in numpy?
                            
                                Restrict App Engine access to G Suite accounts on custom domain
                            
                                Tensorflow graph editor reroute complex network
                            
                                How to lock a critical section in Django?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compute similarities based on co-occurrence matrix?

Tags:

python

matrix

cosine-similarity

find-occurrences