I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?
For example: A→B, B→C, C→D, C→E, F→B, F→E
Selecting any of them should return the same result (all of them).
I am very new to ArangoDB.
What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.
AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.
In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node
and an edge collection links
and they contain the data for this graph:
// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
// return the key (node name) for every vertex we see
RETURN v._key
This will only return [ "B" ]
, because the traversal depth is implicitly 1..1
(min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:
FOR v IN 1..10 OUTBOUND "node/A" links
RETURN v._key
This will give us [ "B", "C", "D", "E"]
. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND
, but in your case, we want to ignore the direction of the edge and follow anyway:
FOR v IN 1..10 ANY "node/A" links
RETURN v._key
The result might be a bit surprising at first:[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]
We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v
. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"}
to do so.)
It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):
FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)
Result:
[
"A - B",
"A - B - C",
"A - B - C - D",
"A - B - C - E",
"A - B - C - E - F",
"A - B - C - E - F - B",
"A - B - F",
"A - B - F - E",
"A - B - F - E - C",
"A - B - F - E - C - D",
"A - B - F - E - C - B"
]
There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path"
).
Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT
, COLLECT
or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.
uniqueEdges: "global"
would still include the B node twice, but uniqueVertices: "global"
gives the desired result. In addition, bfs: true
for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.
There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0]
for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:
FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
RETURN v._key
[ "A", "B", "C", "D", "E", "F" ]
To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:
FOR doc IN node
RETURN (
FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
SORT v._key
RETURN v._key
)
All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With