Clustering a sparse co-occurrence matrix

Question

I have two N x N co-occurrence matrices (484x484 and 1060x1060) that I have to analyze. The matrices are symmetrical along the diagonal and contain lots of zero values. The non-zero values are integers.

I want to group together the positions that are non-zero. In other words, what I want to do is the algorithm on this link. When order by cluster is selected, the matrix gets re-arranged in rows and columns to group the non-zero values together.

Since I am using Python for this task, I looked into SciPy Sparse Linear Algebra library, but couldn't find what I am looking for.

Any help is much appreciated. Thanks in advance.

Leo Martins · Accepted Answer

If you have a matrix dist with pairwise distances between objects, then you can find the order on which to rearrange the matrix by applying a clustering algorithm on this matrix (http://scikit-learn.org/stable/modules/clustering.html). For example it might be something like:

from sklearn import cluster
import numpy as np
model = cluster.AgglomerativeClustering(n_clusters=20,affinity="precomputed").fit(dist)
new_order = np.argsort(model.labels_)
ordered_dist = dist[new_order] # can be your original matrix instead of dist[]
ordered_dist = ordered_dist[:,new_order]

The order is given by the variable model.labels_, which has the number of the cluster to which each sample belongs. A few observations:

You have to find a clustering algorithm that accepts a distance matrix as input. AgglomerativeClustering is such an algorithm (notice the affinity="precomputed" option to tell it that we are using pre-computed distances).
What you have seems to be a pairwise similarity matrix, in which case you need to transform it to a distance matrix (e.g. dist=1 - data/data.max())
In the example I assumed 20 clusters, you may have to play with this variable a bit. Alternatively, you might try to find the best one-dimensional representation of your data (using e.g. MDS) to describe the optimal ordering of samples.

Clustering a sparse co-occurrence matrix

Tags:

python

matrix

cluster-analysis

sparse-matrix

reincore

1 Answers

Leo Martins

Recent Activity

Donate For Us

Clustering a sparse co-occurrence matrix

Tags:

python

matrix

cluster-analysis

sparse-matrix

reincore

1 Answers

Leo Martins

Related questions

Recent Activity

Donate For Us