I have a dissimilarity matrix on which I would like to perform multidimensional scaling (MDS) using the sklearn.manifold.MDS function. The dissimilarity between some elements in this matrix is not meaningful and I am thus wondering if there is a way to run MDS on a sparse matrix or on a matrix with missing values? According to this question, dissimilarities with 0 are considered as missing values, but I was unable to find this statement in the official documentation. Isn't a dissimilarity with value 0 interpreted as points that are very close to each other?
Any suggestions how to obtain a lower-dimensional representation of my high-dimensional dataset based on a sparse dissimilarity matrix would be welcome. Thanks!
It's calculated using the Pythagorean theorem (c2 = a2 + b2), although it becomes somewhat more complicated for n-dimensional space (see “Euclidean Distance in n-dimensional space“). This results in the similarity matrix. Compare the similarity matrix with the original input matrix by evaluating the stress function.
Multidimensional scaling (MDS) is a technique that creates a map displaying the relative positions of a number of objects, given only a table of the distances between them. The map may consist of one, two, three, or even more dimensions. The program calculates either the metric or the non-metric solution.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration of. points mapped into an abstract Cartesian space.
Metric MDS attempts to model the similarity/dissimilarity of data by calculating distances between each pair of points using their geometric coordinates. The key here is the ability to measure a distance using a linear scale. E.g., a distance of 10 units would be considered twice as far as a distance of 5 units.
Thanks for the hint to that question! I looked into the code:
For zeros on the non-diagonal to be interpreted as missing values you need to use the non-metric version of the MDS using the SMACOF algorithm by MDS(metric=False)
.
I have the same issue and until now I only see the alternative to do matrix completion on the distance matrix before applying MDS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With