Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.

The query is:

Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.

(start)-[*]->(v)

for every edge E walked

if startVertex(E).someproperty != endVertex(E).someproperty, output E.

The graph may contain cycles.

example query

For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.

If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)

But I can't express this algorithm in Cypher since it's very different language.

Then i wrote this query:

find all reachable vertexes

(start)-[*]->(v), reachable = start + v. 

find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.

match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop

so the Cypher code looks like this:

MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn

The performance of this query is not good as I thought. There are index on :Col(schema)

I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.

My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.

The following code block is the logic I want:

 VertexQueue1 = (starting vertexes);
 VisitedVertexSet = (empty);
 EdgeSet1 = (empty);
 While (VertexSet1 is not empty)
 {
     Vertex0 = VertexQueue1.pop();
     VisitedVertexSet.add(Vertex0);
     foreach (Edge0 starting from Vertex0)
     {
           Vertex1 = endingVertex(Edge0);
           if (Vertex1.schema <> Vertex0.schema)
           {
               EdgeSet1.put(Edge0);
           }
           if (VisitedVertexSet.notContains(Vertex1) 
               and VertexQueue1.notContains(Vertex1)) 
           {
               VertexQueue1.push(Vertex1);
           }
     }
 }
 return EdgeSet1;

EDIT:

The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.

LEFT one:

match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
 where prev.schema<>node.schema 
return distinct prev,node

RIGHT one:

MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn

profile result

like image 231
user2218067 Avatar asked Apr 12 '26 18:04

user2218067


1 Answers

I can't say that this would be faster, but here's another way to try:

MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}

WITH collect(ID(node)) AS node_ids

MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node

I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.

like image 61
Brian Underwood Avatar answered Apr 15 '26 00:04

Brian Underwood



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!