I have a graph with 0.5 billion of nodes and edges in Neo. I want to find shortest path between 2 nodes that avoids supernodes (even if it is longer than paths having supernodes on them). The below query works fine for smaller graphs, but never finishes for the graph of the size I am dealing with: <pre class="prettyprint"><code>MATCH (n:Node { id:'123'}),(m:Node { id:'234' }), p = shortestPath((n)-[*..6]-(m)) WHERE NONE(x IN NODES(p) WHERE size((x)--())>1000) RETURN p </code></pre> If I remove the WHERE clause it is super fast. Typically subsecond. How can I speed it up? Would precalculating node degrees and indexing them help? Should I resort to duplicating all the edges apart from the ones adjacent to supernodes, giving them a new label and using them for my shortestPath query without the WHERE clause? Any other suggestions?

As far as I can tell the Neo4j shortest path implementation prunes paths when the WHERE ALL contains relationships only (not nodes). Where it cannot prune the queries it finds all the paths then filters them (slow). As Martin says you can add a label: <pre class="prettyprint"><code>MATCH (x:Node) WHERE size((x)--())>1000 SET n:Supernode </code></pre> And then interrogate the nodes' label via the edges: <pre class="prettyprint"><code>MATCH p = shortestPath((n:Node { id:'1'})-[*..6]-(m:Node { id:'2' })) WHERE ALL( rel IN relationships(p) WHERE not (startNode(rel):Supernode or endNode(rel):Supernode)) RETURN p </code></pre> This will allow Neo4j to use the optimised, bi-directional, breadth-first (fast) query. Some more reading here: https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/shortestpath-planning/

You could also try to add a label for supernodes: <pre class="prettyprint"><code>MATCH (x:Node) WHERE size((x)--())>1000 SET n:Supernode </code></pre> Does this run and finish on your data? How many supernodes and normal nodes do you have? Then try: <pre class="prettyprint"><code>MATCH (n:Node { id:'123'}),(m:Node { id:'234' }) WITH n, m MATCH p = shortestPath((n)-[*..6]-(m)) WHERE NONE(x IN NODES(p) WHERE (x:Supernode)) RETURN p </code></pre> I suppose a label check is faster.

Shortest paths without supernodes in Neo4j

Tags:

graph

shortest-path

neo4j

cypher

I have a graph with 0.5 billion of nodes and edges in Neo. I want to find shortest path between 2 nodes that avoids supernodes (even if it is longer than paths having supernodes on them).

The below query works fine for smaller graphs, but never finishes for the graph of the size I am dealing with:

MATCH (n:Node { id:'123'}),(m:Node { id:'234' }), p = shortestPath((n)-[*..6]-(m)) 
WHERE NONE(x IN NODES(p) WHERE size((x)--())>1000)
RETURN p

If I remove the WHERE clause it is super fast. Typically subsecond.

How can I speed it up? Would precalculating node degrees and indexing them help? Should I resort to duplicating all the edges apart from the ones adjacent to supernodes, giving them a new label and using them for my shortestPath query without the WHERE clause? Any other suggestions?

353

asked Mar 27 '17 16:03

Tom

2 Answers

As far as I can tell the Neo4j shortest path implementation prunes paths when the WHERE ALL contains relationships only (not nodes). Where it cannot prune the queries it finds all the paths then filters them (slow).

As Martin says you can add a label:

MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode

And then interrogate the nodes' label via the edges:

MATCH p = shortestPath((n:Node { id:'1'})-[*..6]-(m:Node { id:'2' })) 
WHERE ALL( rel IN relationships(p) WHERE not (startNode(rel):Supernode or endNode(rel):Supernode))
RETURN p

This will allow Neo4j to use the optimised, bi-directional, breadth-first (fast) query.

Some more reading here: https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/shortestpath-planning/

146

answered Oct 21 '22 14:10

Adam

You could also try to add a label for supernodes:

MATCH (x:Node)
WHERE size((x)--())>1000
SET n:Supernode

Does this run and finish on your data? How many supernodes and normal nodes do you have?

Then try:

MATCH (n:Node { id:'123'}),(m:Node { id:'234' })
WITH n, m
MATCH p = shortestPath((n)-[*..6]-(m))
WHERE NONE(x IN NODES(p) WHERE (x:Supernode))
RETURN p

I suppose a label check is faster.

answered Oct 21 '22 15:10

Martin Preusse

Related questions
                            
                                Task-based idle detection
                            
                                Turn off animation at runtime in angular
                            
                                Does pandas dataframe merge work with greater or less?
                            
                                Installing Gurobi in Virtualenv without Anaconda
                            
                                Vuejs computed properties and jquery ui sortable issue
                            
                                How to check if a template function was specialized?
                            
                                Crash casting WKNSURLRequest as? other type
                            
                                Windows docker container cannot ping host
                            
                                How do I create an 2-D array in Haskell?
                            
                                How to change controller response in filter to make the response structure consistent all over the API's using spring-boot
                            
                                Availability of snapcraft on AlpineLinux
                            
                                path '%s' cannot be absolute" % pathname

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With