Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute similarities based on co-occurrence matrix?

I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about these approaches. Any inputs to get started is appreciated.

  1. Use cosine to compute sim between two vectors
  2. Turn this into a graph, use measures like simrank to compute similarity - may use the occurrence count as a weight between two nodes.
like image 764
kitchenprinzessin Avatar asked Feb 01 '17 07:02

kitchenprinzessin


People also ask

How is co-occurrence calculated?

The most straightforward way to measure co-occurrence between two species is by the observed number of times that the two spe- cies co-occur relative to the expected number of times (San- derson, 2000; Sfenthourakis et al., 2004, 2006; Veech, 2006, 2013; Pitta et al., 2012).

What is a co-occurrence matrix used for?

A GLCM matrix is a method to calculate the spatial relationship of an image pixel.

What is co-occurrence matrix in recommendation system?

Item to Item Recommendations Based on Co-Occurrence Matrix The goal of co-occurrence recommendation machine learning algorithm is finding how many times two food have appeared together in the user historical data. For example, apple and banana appeared together twice in the user Ann and William.

What is co-occurrence matrix in NLP?

A co-occurrence matrix or co-occurrence distribution (also referred to as : gray-level co-occurrence matrices GLCMs) is a matrix that is defined over an image to be the distribution of co-occurring pixel values (grayscale values, or colors) at a given offset.


3 Answers

I would recommend using spatial cosine similarity. Alternatively you could calculate jaccard's similarity for each item pair.

After calculating either similarity matrix (affinity matrix) you can use a spectral (or spatial) clustering algorithm, such as sklearn's spectral clustering algorithm to group those items.

like image 56
Nico Avatar answered Oct 22 '22 09:10

Nico


You can thread it as 1877 items with 1877 features each. If two items are similar, than they co-occurrences will be similar. Given that you might use NearestNeighbors in order to find closest one. There are may metrics available.

Also, reprocessing the data may help you. I do not know it's distribution but you might want to normalize values into range [0;1] or doing sth like that.

like image 1
mbednarski Avatar answered Oct 22 '22 09:10

mbednarski


If your co-nonoccurence matrix is symmetrical, you don't need to normalize it. You can refer to this paper for gain more information about normalization of symmetrical and asymmetrical co-matrices: Leydesdorff, L. and Vaughan, L., 2006. Co‐occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the American Society for Information Science and technology, 57(12), pp.1616-1628. please, click hear

like image 1
Hamed Baziyad Avatar answered Oct 22 '22 07:10

Hamed Baziyad