I have a graph with 0.5 billion of nodes and edges in Neo. I want to find shortest path between 2 nodes that avoids supernodes (even if it is longer than paths having supernodes on them).
The below query works fine for smaller graphs, but never finishes for the graph of the size I am dealing with:
MATCH (n:Node { id:'123'}),(m:Node { id:'234' }), p = shortestPath((n)-[*..6]-(m))
WHERE NONE(x IN NODES(p) WHERE size((x)--())>1000)
RETURN p
If I remove the WHERE clause it is super fast. Typically subsecond.
How can I speed it up? Would precalculating node degrees and indexing them help? Should I resort to duplicating all the edges apart from the ones adjacent to supernodes, giving them a new label and using them for my shortestPath query without the WHERE clause? Any other suggestions?
With UNWIND , you can transform any list back into individual rows. These lists can be parameters that were passed in, previously collect -ed result or other list expressions. One common usage of unwind is to create distinct lists. Another is to create data from parameter lists that are provided to the query.
In general, there can be multiple shortest paths in a graph. In particular, you can use Djikstra's algorithm to compute shortest paths and this algorithm can return multiple paths from any node A to any other node B.
As far as I can tell the Neo4j shortest path implementation prunes paths when the WHERE ALL contains relationships only (not nodes). Where it cannot prune the queries it finds all the paths then filters them (slow).
As Martin says you can add a label:
MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode
And then interrogate the nodes' label via the edges:
MATCH p = shortestPath((n:Node { id:'1'})-[*..6]-(m:Node { id:'2' }))
WHERE ALL( rel IN relationships(p) WHERE not (startNode(rel):Supernode or endNode(rel):Supernode))
RETURN p
This will allow Neo4j to use the optimised, bi-directional, breadth-first (fast) query.
Some more reading here: https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/shortestpath-planning/
You could also try to add a label for supernodes:
MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode
Does this run and finish on your data? How many supernodes and normal nodes do you have?
Then try:
MATCH (n:Node { id:'123'}),(m:Node { id:'234' })
WITH n, m
MATCH p = shortestPath((n)-[*..6]-(m))
WHERE NONE(x IN NODES(p) WHERE (x:Supernode))
RETURN p
I suppose a label check is faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With