i have a distance matrix that is 1000x1000 in dimension and symmetric with 0s along the diagonal. i want to form groupings of distances (clusters) by simultaneously reordering the rows and columns of the matrix. this is like reordering a matrix before you visualize its clusters with a heatmap. i feel like this should be an easy problem, but i am not having much luck finding code that does the permutations online. can anyone help?
Here is one approach that came to mind:
I will use Octave (everything I am doing should also work in Matlab) since it has a Reverse Cuthill-McKee (RCM) implementation built in.
First, we need to generate a distance matrix. This function creates a random set of points and their distance matrix:
function [x, y, A] = make_rand_dist_matrix(n)
x = rand(n, 1);
y = rand(n, 1);
A = sqrt((repmat(x, 1, n) - repmat(x', n, 1)).^2 +
(repmat(y, 1, n) - repmat(y', n, 1)).^2);
end
Let's use that to generate and visualize a 100-point example.
[x, y, A] = make_rand_dist_matrix(100);
surf(A);
Viewing the surface plot from above gets the image below (yours will be different, of course).
Warm colors represent greater distances than cool colors. Row (or column, if you prefer) i
in the matrix contains the distances between point i
and all points. The distance between point i
and point j
is in entry A(i, j)
. Our goal is to reorder the matrix so that the row corresponding to point i
is near rows corresponding to points a short distance from i
.
A simple way to sparsify A
is to make all entries greater than some threshold zero, and that is what is done below, although more sophisticated approaches may prove more effective.
B = A < 0.2; % sparsify A -- only values less than 0.2 are nonzeros in B
p = symrcm(B); % compute reordering by Reverse Cuthill-McKee
surf(A(p, p)); % visualize reordered distance matrix
The matrix is now ordered in a way that brings nearby points closer together in the matrix. This result is not optimal, of course. Sparse matrix bandwidth compression is computed using heuristics, and RCM is a very simple approach. As I mentioned above, more sophisticated approaches for producing the sparse matrix may give better results, and different algorithms may also yield better results for the problem.
Another way to look at what happened is to plot the points and connect a pair of points if their corresponding rows in the matrix are adjacent. Your goal is to have the lines connecting pairs of points that are near each other. For a more dramatic effect, we use a larger set of points than above.
[x, y, A] = make_rand_dist_matrix(2000);
plot(x, y); % plot the points in their initial, random order
Clearly, connections are all over the place and are occurring over a wide variety of distances.
B = A < 0.2; % sparsify A
p = symrcm(B);
plot(x(p), y(p)) % plot the reordered points
After reordering, the connections tend to be over much smaller distances and much more orderly.
Two Matlab functions do this: symrcm and symamd. Note that there is no unique solution to this problem. Clustering is another approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With