Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct input to scikit-learn's MDS?

I'm hoping this is the correct place to post - if not, I am willing to change to SO.

In any case, I am using MDS to help me find a 2-D representation of a dataset. Essentially, these are pKa values of amino acid residues across many years' worth of protein data - decimal numbers of the same scale, at its core. There are many positions (~600 rows), and there are many years (~12 columns).

My question is this: is the correct input to MDS the data matrix (years vs positions), or can I put in the correlation matrix (year vs year)? I ask because the API docs conflict with the written description.

API docs say data matrix: http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html#sklearn.manifold.MDS (i.e. n_samples, n_features).

Written description says "the input similarity matrix": http://scikit-learn.org/stable/modules/manifold.html

like image 915
ericmjl Avatar asked Dec 25 '22 05:12

ericmjl


1 Answers

If you pass dissimilarity='euclidean' to the initial estimator (or by default), it will take a data matrix and compute the Euclidean distance matrix for you.

If you pass dissimilarity='precomputed', it takes a dissimilarity matrix.

The docs are indeed not super-clear on this, though; I'm sure a pull request adding a brief note to the description of the X argument, and clarifying that 'euclidean' is the default (I had to check the source), would be accepted.

like image 111
Danica Avatar answered Dec 29 '22 07:12

Danica