I have a wireless mesh network of nodes, each of which is capable of reporting its 'distance' to its neighbors, measured in (simplified) signal strength to them. The nodes are geographically in 3d space but because of radio interference, the distance between nodes need not be trigonometrically (trigonomically?) consistent. I.e., given nodes A, B and C, the distance between A and B might be 10, between A and C also 10, yet between B and C 100.
What I want to do is visualize the logical network layout in terms of connectness of nodes, i.e. include the logical distance between nodes in the visual.
So far my research has shown the multidimensional scaling (MDS) is designed for exactly this sort of thing. Given that my data can be directly expressed as a 2d distance matrix, it's even a simpler form of the more general MDS.
Now, there seem to be many MDS algorithms, see e.g. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html and http://tapkee.lisitsyn.me/ . I need to do this in C++ and I'm hoping I can use a ready-made component, i.e. not have to re-implement an algo from a paper. So, I thought this: https://sites.google.com/site/simpmatrix/ would be the ticket. And it works, but:
The layout is not stable, i.e. every time the algorithm is re-run, the position of the nodes changes (see differences between image 1 and 2 below - this is from having been run twice, without any further changes). This is due to the initialization matrix (which contains the initial location of each node, which the algorithm then iteratively corrects) that is passed to this algorithm - I pass an empty one and then the implementation derives a random one. In general, the layout does approach the layout I expected from the given input data. Furthermore, between different runs, the direction of nodes (clockwise or counterclockwise) can change. See image 3 below.
The 'solution' I thought was obvious, was to pass a stable default initialization matrix. But when I put all nodes initially in the same place, they're not moved at all; when I put them on one axis (node 0 at 0,0 ; node 1 at 1,0 ; node 2 at 2,0 etc.), they are moved along that axis only. (see image 4 below). The relative distances between them are OK, though.
So it seems like this algorithm only changes distance between nodes, but doesn't change their location.
Thanks for reading this far - my questions are (I'd be happy to get just one or a few of them answered as each of them might give me a clue as to what direction to continue in):
If all else fails, my next option is going to be to use the algorithm I mentioned above, increase the number of iterations to keep the variability between runs at around a few pixels (I'd have to experiment with how many iterations that would take), then 'rotate' each node around node 0 to, for example, align nodes 0 and 1 on a horizontal line from left to right; that way, I would 'correct' the location of the points after their relative distances have been determined by the MDS algorithm. I would have to correct for the order of connected nodes (clockwise or counterclockwise) around each node as well. This might become hairy quite quickly.
Obviously I'd prefer a stable algorithmic solution - increasing iterations to smooth out the randomness is not very reliable.
Thanks.
EDIT: I was referred to cs.stackexchange.com and some comments have been made there; for algorithmic suggestions, please see https://cs.stackexchange.com/questions/18439/stable-multi-dimensional-scaling-algorithm .
Image 1 - with random initialization matrix:
Image 2 - after running with same input data, rotated when compared to 1:
Image 3 - same as previous 2, but nodes 1-3 are in another direction:
Image 4 - with the initial layout of the nodes on one line, their position on the y axis isn't changed:
For example, given a matrix of perceived similarities between various brands of air fresheners, MDS plots the brands on a map such that those brands that are perceived to be very similar to each other are placed near each other on the map, and those brands that are perceived to be very different from each other are ...
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration of. points mapped into an abstract Cartesian space.
What is multidimensional scaling in machine learning? Multidimensional Scaling (MDS) is a family of mathematical models that can be utilized for the purpose of analyzing and visualizing the distances between objects, where the distance is known between pairs of these objects.
Multidimensional scaling refers to a family of mathematical (not statistical) models that can be used to analyze distances between objects (e.g., health states). Information contained in a set of data is represented by a set of points in space.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configuration of n points mapped into an abstract Cartesian space.
The data used for multidimensional scaling (MDS) are dissimilarities between pairs of objects. The main objective of MDS is to represent these dissimilarities as distances between points in a low dimensional space such that the distances correspond as closely as possible to the dissimilarities.
Generalized multidimensional scaling (GMD) An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.
A big drawback of those first two Multidimensional Scaling methods is the fact that they both start from 1 distance/dissimilarity matrix. In real-life data analysis, it is much more common to have multiple estimations of distance.
Most scaling algorithms effectively set "springs" between nodes, where the resting length of the spring is the desired length of the edge. They then attempt to minimize the energy of the system of springs. When you initialize all the nodes on top of each other though, the amount of energy released when any one node is moved is the same in every direction. So the gradient of energy with respect to each node's position is zero, so the algorithm leaves the node where it is. Similarly if you start them all in a straight line, the gradient is always along that line, so the nodes are only ever moved along it.
(That's a flawed explanation in many respects, but it works for an intuition)
Try initializing the nodes to lie on the unit circle, on a grid or in any other fashion such that they aren't all co-linear. Assuming the library algorithm's update scheme is deterministic, that should give you reproducible visualizations and avoid degeneracy conditions.
If the library is non-deterministic, either find another library which is deterministic, or open up the source code and replace the randomness generator with a PRNG initialized with a fixed seed. I'd recommend the former option though, as other, more advanced libraries should allow you to set edges you want to "ignore" too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With