I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about these approaches. Any inputs to get started is appreciated.
The most straightforward way to measure co-occurrence between two species is by the observed number of times that the two spe- cies co-occur relative to the expected number of times (San- derson, 2000; Sfenthourakis et al., 2004, 2006; Veech, 2006, 2013; Pitta et al., 2012).
A GLCM matrix is a method to calculate the spatial relationship of an image pixel.
Item to Item Recommendations Based on Co-Occurrence Matrix The goal of co-occurrence recommendation machine learning algorithm is finding how many times two food have appeared together in the user historical data. For example, apple and banana appeared together twice in the user Ann and William.
A co-occurrence matrix or co-occurrence distribution (also referred to as : gray-level co-occurrence matrices GLCMs) is a matrix that is defined over an image to be the distribution of co-occurring pixel values (grayscale values, or colors) at a given offset.
I would recommend using spatial cosine similarity. Alternatively you could calculate jaccard's similarity for each item pair.
After calculating either similarity matrix (affinity matrix) you can use a spectral (or spatial) clustering algorithm, such as sklearn's spectral clustering algorithm to group those items.
You can thread it as 1877 items with 1877 features each. If two items are similar, than they co-occurrences will be similar. Given that you might use NearestNeighbors
in order to find closest one. There are may metrics available.
Also, reprocessing the data may help you. I do not know it's distribution but you might want to normalize values into range [0;1] or doing sth like that.
If your co-nonoccurence matrix is symmetrical, you don't need to normalize it. You can refer to this paper for gain more information about normalization of symmetrical and asymmetrical co-matrices: Leydesdorff, L. and Vaughan, L., 2006. Co‐occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the American Society for Information Science and technology, 57(12), pp.1616-1628. please, click hear
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With