I have a total undirected graph where nodes represent points on points on a plane, and edges are approximate euclidean distances between the points. I would like to "embed" this graph in a two dimensional space. That is, I want to convert each vertex to an (x,y) position tuple so that for any two two vertices v and w, the edge (v,w) has weight close to dist(v,w).
For example, if I had the graph with nodes A, B, C, and D and edges with weights (A,B): 20; (A,C): 22; (A,D): 26; (B, C): 30; (B, D): 20, (C, D): 19, then you could assign the points A: (0,0); B: (10, 0); C: (0, 10); D: (10, 10). Clearly this is imperfect, but it is a reasonable approximation.
I don't care about getting the best possible solution, I just want a reasonable one in a reasonable amount of time.
(In case you want the motivation for this problem. I have a physical system where I have noisy measurements of distances from all pairs of points. Distance measurements are noisy, but tend to be within a factor of two of the true value. I have made all of these measurements, and now have a graph with several thousand nodes, and several million edges, and want to place the points on a plane.)
the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently. and significantly outperforms standard methods of statistical correspondence modeling, such as.
A graph embedding, sometimes also called a graph drawing, is a particular drawing of a graph. Graph embeddings are most commonly drawn in the plane, but may also be constructed in three or more dimensions. The above figure shows several embeddings of the cubical graph.
A graph can be embedded on the surface of a sphere without crossings if and only if it can be embedded in the plane without crossings. The plane is topologically a sphere with a missing point at the North pole.
Apparently, graphs and manifolds are non-Euclidean data.
You may be able to adapt the force-based graph drawing algorithm for your needs.
This algorithm attempts to find a good layout for an undirected graph G(V,E)
by treating each vertex in V
as a Cartesian point and each edge in E
as a linear spring. Additionally, a pair-wise repulsive force (i.e. Coulomb's law) is calculated between vertices globally - this prevents the clustering of vertices in Cartesian space that are non-adjacent in G(V,E)
.
In your case you could set the equilibrium length of the springs equal to your edge weights - this should give a layout with pair-wise Euclidean vertex distances close to your edge weights.
The algorithm updates an initial distribution (possibly random) in a pseudo-time stepping fashion based on the sum of forces at each vertex. The algorithm terminates when an approximate steady-state is reached. A simplified pseudo-code:
while(not converged)
for i = vertices in V
F(i) = sum of spring + repulsive forces on ith vertex
endfor
Update vertex positions based on force vector F
if (vertex positions not changing much)
converged = true
endif
endwhile
There are a number of optimisations that can be applied to reduce the complexity of the algorithm. For instance, a spatial index (such as a quadtree) can be used to allow for efficient calculation of an approximate repulsive force between "near-by" vertices, rather than the slow global calculation. It's also possible to use multi-level graph agglomeration techniques to improve convergence and optimality.
Finally, note that there are several good libraries for graph drawing that implement optimised versions of this algorithm - you might want to check out Graphviz for instance.
For starters, I think I'd go for a heuristic search approach.
You actually want to find a set of point p1,p2,...,p_n that minimizes the function:
f(X) = Sum (|dist(p_i,p_j) - weight(n_i,n_j)|) [for each i,j ]
The problem can be heuristically solved by some algorithms including Hill Climbing and Genetic Algorithms.
I personally like Hill Climbing, and the approach is as follows:
best <- [(0,0),(0,0),...,(0,0)]
while there is still time:
S <- random initialized vector of points
flag <- true
while (flag):
flag <- false
candidates <- next(S) (*)
S <- X in candidates such that f(X) <= f(Y) for each X in candidates (**)
if f(S) was improved:
flag <- true
if f(S) <= f(best):
best <- S
return best
(*)
next() generates a list of candidates. It can utilize information about gradient of function (and basically decay into something similar to gradient descent) for example, or sample a few random 'directions' and put them as candidates (all in the multi-dimensional vector, where each point is a dimension).
(**)
In here, you basically chose the "best" candidate, and store it in S, so you will continue with it in next iteration.
Note, the algorithm is any-time, so it is expected to get better the more time you have to give it. This behavior is achieved by the random initialization of starting point - which is likely to change the ending result, and by the random selection of points for candidates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With