Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ArangoDB: Get every node, which is in any way related to a selected node

I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?

For example: A→B, B→C, C→D, C→E, F→B, F→E

Selecting any of them should return the same result (all of them).

I am very new to ArangoDB.

like image 618
Loredra L Avatar asked Aug 03 '16 12:08

Loredra L


1 Answers

What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.


AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.

In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node and an edge collection links and they contain the data for this graph:

Example Graph

// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
    // return the key (node name) for every vertex we see
    RETURN v._key

This will only return [ "B" ], because the traversal depth is implicitly 1..1 (min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:

FOR v IN 1..10 OUTBOUND "node/A" links
    RETURN v._key

This will give us [ "B", "C", "D", "E"]. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND, but in your case, we want to ignore the direction of the edge and follow anyway:

FOR v IN 1..10 ANY "node/A" links
    RETURN v._key

The result might be a bit surprising at first:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]

We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"} to do so.)

It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):

FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
    RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)

Result:

[
  "A - B",
  "A - B - C",
  "A - B - C - D",
  "A - B - C - E",
  "A - B - C - E - F",
  "A - B - C - E - F - B",
  "A - B - F",
  "A - B - F - E",
  "A - B - F - E - C",
  "A - B - F - E - C - D",
  "A - B - F - E - C - B"
]

There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path").

Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT, COLLECT or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.

uniqueEdges: "global" would still include the B node twice, but uniqueVertices: "global" gives the desired result. In addition, bfs: true for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.

There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0] for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:

FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
    RETURN v._key

[ "A", "B", "C", "D", "E", "F" ]

To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:

FOR doc IN node
    RETURN (
        FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
            SORT v._key
            RETURN v._key
    )

All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)

like image 82
CodeManX Avatar answered Nov 08 '22 05:11

CodeManX