Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shortest paths without supernodes in Neo4j

I have a graph with 0.5 billion of nodes and edges in Neo. I want to find shortest path between 2 nodes that avoids supernodes (even if it is longer than paths having supernodes on them).

The below query works fine for smaller graphs, but never finishes for the graph of the size I am dealing with:

MATCH (n:Node { id:'123'}),(m:Node { id:'234' }), p = shortestPath((n)-[*..6]-(m)) 
WHERE NONE(x IN NODES(p) WHERE size((x)--())>1000)
RETURN p

If I remove the WHERE clause it is super fast. Typically subsecond.

How can I speed it up? Would precalculating node degrees and indexing them help? Should I resort to duplicating all the edges apart from the ones adjacent to supernodes, giving them a new label and using them for my shortestPath query without the WHERE clause? Any other suggestions?

like image 353
Tom Avatar asked Mar 27 '17 16:03

Tom


People also ask

What is unwind in neo4j?

With UNWIND , you can transform any list back into individual rows. These lists can be parameters that were passed in, previously collect -ed result or other list expressions. One common usage of unwind is to create distinct lists. Another is to create data from parameter lists that are provided to the query.

Can a graph have more than one shortest path?

In general, there can be multiple shortest paths in a graph. In particular, you can use Djikstra's algorithm to compute shortest paths and this algorithm can return multiple paths from any node A to any other node B.


2 Answers

As far as I can tell the Neo4j shortest path implementation prunes paths when the WHERE ALL contains relationships only (not nodes). Where it cannot prune the queries it finds all the paths then filters them (slow).

As Martin says you can add a label:

MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode

And then interrogate the nodes' label via the edges:

MATCH p = shortestPath((n:Node { id:'1'})-[*..6]-(m:Node { id:'2' })) 
WHERE ALL( rel IN relationships(p) WHERE not (startNode(rel):Supernode or endNode(rel):Supernode))
RETURN p

This will allow Neo4j to use the optimised, bi-directional, breadth-first (fast) query.

Some more reading here: https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/shortestpath-planning/

like image 146
Adam Avatar answered Oct 21 '22 14:10

Adam


You could also try to add a label for supernodes:

MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode

Does this run and finish on your data? How many supernodes and normal nodes do you have?

Then try:

MATCH (n:Node { id:'123'}),(m:Node { id:'234' })
WITH n, m
MATCH p = shortestPath((n)-[*..6]-(m))
WHERE NONE(x IN NODES(p) WHERE (x:Supernode))
RETURN p

I suppose a label check is faster.

like image 2
Martin Preusse Avatar answered Oct 21 '22 15:10

Martin Preusse