Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better Way to remove cycles from a path in neo4j graph

Tags:

neo4j

cypher

I am using neo4j graph database version 2.1.7. Brief Details around data: 2 million nodes with 6 different type of nodes, 5 million relationships with only 5 different type of relationships and mostly connected graph but contains a few isolated subgraphs.

While resolving paths, i get cycles in path. And to restrict that, i used the solution shared in below: Returning only simple paths in Neo4j Cypher query

Here is the Query, i am using:

MATCH (n:nodeA{key:905728}) 
MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA) 
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a))) 
and (length(EXTRACT (p in NODES(path)| p.key)) > 1) 
and ((exists ((c)-[:rel5]->(b)) and (not exists((b)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (b)-[]->(x))))
    OR (not exists ((c)-[:rel5]->()) and (not exists ((c)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (c)-[]->(x))))) 
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);

The above query solves mine requirement but is not cost effective and keeps running if is run for huge subgraph. I have used 'Profile' command to improve query performance from what i started with. But, now stuck at this point. The performance has improved but, not what i expected from neo4j :(

like image 947
Hemant Avatar asked Oct 20 '22 12:10

Hemant


1 Answers

I don't know that I have a solution, but I have a number of suggestions. Some might speed things up, some might just make the query easier to read.

Firstly, rather than putting exists ((c)-[:rel5]->(b)) in your WHERE, I believe you can put it in your MATCH like this:

MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA), (c)-[:rel5]->(b)

I don't think you need the exists keyword. I think you can just say, for example, (NOT (b)-[:rel1|rel2|rel3|rel4]->(:nodeA))

I'd also suggest thinking about the WITH clause for potential performance improvements.

A couple of notes about your variable paths: In *0.. the 0 means that your potentially looking for a self-reference. That may or may not be what you want. Also, leaving the variable path open ended can often cause performance problems (as I think you're seeing). If you can possibly cap it that may help.

Also, if you upgrade to 2.2.1, there are a number of built-in performance improvements with the 2.2.x line, but you also get visual PROFILEing in the console and a new EXPLAIN command which both profiles and tells you the real performance of the query after running it.

One thing to consider too is that I don't think you're hitting performance boundaries of Neo4j but rather, perhaps, you're potentially hitting some boundaries of Cypher. If so, I might suggest you do your querying with the Java APIs that Neo4j provides for better performance and more control. This can either be via embedding your database if you're using a JVM-compatible language or by writing an unmanaged extension which lets you do your own querying in java but provide a custom REST API from the server

like image 178
Brian Underwood Avatar answered Oct 29 '22 23:10

Brian Underwood