Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cypher: Find any path between nodes

Tags:

neo4j

cypher

I have a neo4j graph that looks like this:

Graph overview

Nodes:

  1. Blue Nodes: Account
  2. Red Nodes: PhoneNumber
  3. Green Nodes: Email

Graph design:

  • (:PhoneNumber) -[:PART_OF]->(:Account)
  • (:Email) -[:PART_OF]->(:Account)

The problem I am trying to solve is to

Find any path that exists between Account1 and Account2.

This is what I have tried so far with no success:

  1. MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
  2. MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
  3. MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
  4. MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
  5. Same queries as above without the shortest path function call.

By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.

Thanks.

like image 429
Agandalf Avatar asked Jul 03 '17 03:07

Agandalf


People also ask

How do you find the shortest path between two nodes in Neo4j?

The Cypher Query Language – CQL, has a built-in function called shortestPath() through which the shortest path can be found between nodes. The shortestPath()function takes a pattern which can comprise of the type of edge or relationship, number of hops in the path and the direction of the relationship.

How do you create a relationship between two nodes in Neo4j?

To create a relationship between two nodes, we first get the two nodes. Once the nodes are loaded, we simply create a relationship between them. The created relationship is returned by the query.

WHAT IS WITH clause in Cypher?

The WITH clause allows query parts to be chained together, piping the results from one to be used as starting points or criteria in the next. It is important to note that WITH affects variables in scope.

What is unwind in Neo4j?

With UNWIND , you can transform any list back into individual rows. These lists can be parameters that were passed in, previously collect -ed result or other list expressions. One common usage of unwind is to create distinct lists. Another is to create data from parameter lists that are provided to the query.


1 Answers

All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.

Solution for Now: Making Your Query Work

The path between any two (n:Account) nodes in your graph is going to look something like this:

(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)

Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:

<-[:PART_OF]-(:Email)-[:PART_OF]->

or

<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->

So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:

MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *

(notice here I've used cypher parameters rather than the integers you put in the original post)

That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).

So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.

As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.

So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).

MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *

Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.

Longer Term Solution: Refactor Graph and/or Use APOC

Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

like image 146
Dom Weldon Avatar answered Oct 30 '22 05:10

Dom Weldon