Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multidimensional scaling with missing values in dissimilarity matrix

I have a dissimilarity matrix on which I would like to perform multidimensional scaling (MDS) using the sklearn.manifold.MDS function. The dissimilarity between some elements in this matrix is not meaningful and I am thus wondering if there is a way to run MDS on a sparse matrix or on a matrix with missing values? According to this question, dissimilarities with 0 are considered as missing values, but I was unable to find this statement in the official documentation. Isn't a dissimilarity with value 0 interpreted as points that are very close to each other?

Any suggestions how to obtain a lower-dimensional representation of my high-dimensional dataset based on a sparse dissimilarity matrix would be welcome. Thanks!

like image 865
Nadja Herger Avatar asked Apr 21 '17 11:04

Nadja Herger


People also ask

How is multidimensional scaling calculated?

It's calculated using the Pythagorean theorem (c2 = a2 + b2), although it becomes somewhat more complicated for n-dimensional space (see “Euclidean Distance in n-dimensional space“). This results in the similarity matrix. Compare the similarity matrix with the original input matrix by evaluating the stress function.

What does multidimensional scaling show?

Multidimensional scaling (MDS) is a technique that creates a map displaying the relative positions of a number of objects, given only a table of the distances between them. The map may consist of one, two, three, or even more dimensions. The program calculates either the metric or the non-metric solution.

What is multidimensional scaling in research methodology?

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration of. points mapped into an abstract Cartesian space.

Why do we use MDS?

Metric MDS attempts to model the similarity/dissimilarity of data by calculating distances between each pair of points using their geometric coordinates. The key here is the ability to measure a distance using a linear scale. E.g., a distance of 10 units would be considered twice as far as a distance of 5 units.


1 Answers

Thanks for the hint to that question! I looked into the code: For zeros on the non-diagonal to be interpreted as missing values you need to use the non-metric version of the MDS using the SMACOF algorithm by MDS(metric=False).

I have the same issue and until now I only see the alternative to do matrix completion on the distance matrix before applying MDS.

like image 73
Jojo Avatar answered Oct 19 '22 05:10

Jojo