Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which algorithms can be used to generate a euclidean embedding for a manifold given a pairwise distance matrix of geodesics?

I have a square matrix D (currently represented as a numpy array of shape (572, 572)) plausibly corresponding to pairwise distances between points along the surface of a roughly cylindrical object. I.e., the value D[i,j] corresponds to the minimal length of any path along the surface of that hollow cylinder. How do I construct a 3-dimensional (or n-dimensional) embedding of those 572 points into euclidean space which preserves those geodesic distances?

Current Attempts

Algorithms like locally linear embedding and isomap are able to take that matrix of pairwise geodesic distances and output an embedding so that the pairwise euclidean distances are the same as the original geodesics. While this is not the same task in general, in the case where the output happens to approximate a hypercube in some dimension the desired transformation has actually happened (consider the swiss roll) since the embedding is itself a manifold, so euclidean distance corresponds to geodesic distance.

This is not the case for even slightly more complicated objects like cylinders. By treating geodesic distances as euclidean, antipodal points on the desired cylinder are mapped to locations much further from each other than desired, and the corresponding global optimization problem will often result in a branching structure with ends of the branches corresponding to maximally distant antipodal points, amplifying small perturbations in the random sampling of the cylinder. In general, naive applications of these algorithms doesn't seem to solve the problem at hand.

Another somewhat fruitful (though expensive) approach has been a brute monte carlo technique. I generate random samples from tubelike objects with varying parameters till I find a set of parameters generating geodesic distance matrices similar to mine, up to a permutation (which is dealt with not too inefficiently by solving the linear system converting that distance matrix to mine and testing to see if the result is near a permutation matrix). Then a near-optimal mapping from my 572 points onto that object preserving pairwise distances is performed by finding the nearest permutation matrix to the aforementioned near-permutation matrix.

This is yielding plausible results, but it presupposes the shape of the data and is horrendously expensive. I've performed some of the more obvious optimizations like working with small random samples instead of the entire data set and using gradient-based techniques for parameter estimation, but a more general-purpose technique would be nice.

Caveats

This problem of course does not have a unique solution. Even assuming that manifolds can be unambiguously identified in 3-space from a finite uniform sampling, just squishing a cylinder yields a shape with the same geodesics and different euclidean distances (hence a different embedding). This does not bother me any more than LLE and Isomap yielding differing solutions, and I would be fine with any plausible answer.

With regards to uniquely identifying manifolds from a finite sample, for the sake of argument I would be fine just using the dist_matrix_ attribute from a fitted Isomap class from the scikit-learn package without any special parameters to find the geodesics. That performs an unnecessary MDS step, but it isn't terribly expensive, and it works out of the box. We would then like an embedding which minimizes the frobenius distance between the original geodesic distance matrix and the dist_matrix_ attribute.

like image 655
Hans Musgrave Avatar asked Jun 05 '18 17:06

Hans Musgrave


1 Answers

While I had initially ruled out locally linear embedding and other similar techniques, that seems to have been in haste. Since manifolds are in fact locally linear, a sufficiently well-sampled, sufficiently nice manifold has the property that its small geodesic distances are approximately the same as their corresponding euclidean distances.

With that in mind, any reconstruction which treats the nearest geodesic neighbors as the nearest euclidean neighbors and approximates the euclidean distance via the geodesic distance will approximately preserve global geodesic distance, up to an accumulated error term. This means that all the standard algorithms which only use local distances have the ability to provide an approximately correct embedding. These include and are not limited to

  • Locally Linear Embedding
  • Isomap
  • Spectral Embedding

Some classical embedding algorithms will not work correctly in this application since they attempt to preserve all distances, and the large geodesics are probably poor representations of euclidean distance. For example, multidimensional scaling is a poor fit without modifications.

Note The reason LLE seemed to yield poor results in my preliminary analysis is that one of my assumptions was violated -- the manifold being sufficiently well-sampled. I was applying it to simple shapes with known desired behavior, but I mistakenly used too few points to ensure a quick feedback loop in my analysis. Better-sampled manifolds behave exactly as they're supposed to.

like image 196
Hans Musgrave Avatar answered Oct 12 '22 19:10

Hans Musgrave