Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to chain multiple WHERE statements together without switching to AND in Neo4j?

Tags:

neo4j

cypher

I have a large application using a Neo4j database. There are dozens of methods that hold Cypher queries. In order to remove the duplicate code that would exist if each method stored its own complete query, I have created private methods that build small commonly used chunks of a query. For example, a method like:

MatchNodesWithLabel(string label)

would return a partial query like:

MATCH (node:label)

These method calls are then followed by methods that add WHERE or RETURN statements to build a full query. This is a small example, there are methods that match and filter whole sets of nodes and relations.

The problem arises when multiple WHERE statements need to be chained together. Cypher doesn't allow a WHERE statement to follow a WHERE statement:

// Invalid
MATCH (node)
WHERE node:label
WHERE node.property = value
...

so any method past the first WHERE must insert an AND statement instead:

// Valid
MATCH (node)
WHERE node:label
AND node.property = value
AND ...

This creates a method ordering problem, where certain methods can not be used after others. A method that inserts a WHERE statement (a Where() method) must precede an And() method. Where() methods then can't be used after an And() method, and And() methods can't be used before Where() methods.

Here are some possible (but ultimately broken) solutions I've come up with to solve this ordering problem:

  1. Duplicate every Where() method into an equivalent AndWhere() method that does the same filtering, but uses AND instead of WHERE. Then in the calling code, use one Where() method first, and AndWhere() methods thereafter.
    • This is not an acceptable solution because of code duplication, and because it complicates the calling code. It just moves the method ordering problem to the calling code, instead of solving the problem. I want to be able to add a Where() call whenever I want, without thinking about how my methods are ordered
  2. Add a WHERE true statement before all the other calls, then make every Where() method insert an AND statement.
    • This only works if the methods in the calling code are specifically ordered. All MATCH methods have to precede the WHERE true, otherwise you could end up with a Cypher query like:
      MATCH (node) WHERE true AND node:type MATCH (somethingElse) AND somethingElse.property = value
      which is invalid since AND follows a MATCH without a WHERE. So once again the responsibility of method ordering has moved to the calling code. With this solution, you have to manually add a WHERE true every time you start a new series of filtering calls.
  3. Parse the existing query to determine whether a WHERE or an AND should be used at the start of every Where() method.
    • This seems ok at first, all you have to do is a check like existingQuery.Contains("WHERE") and if its true, insert AND instead. However this has the same problem as #2. If a new MATCH statement is inserted, the check will still return true, but the query won't be in a state where an AND is a valid statement. So you'd have to parse further back in the query, checking for MATCH and WHERE statements, and you'd have to keep track of the order they occurred in, etc... This is far too complicated.

Is some Neo4j syntax that allows a WHERE to be inserted anywhere, or a query building solution that I haven't some across?

like image 956
Stephen Belden Avatar asked Oct 24 '25 02:10

Stephen Belden


2 Answers

Add a WITH * before every WHERE statement.

WITH * will carry over all named Cypher identifiers already in the query into what is essentially a new query.

So the query will end up looking like:

MATCH (node)
WITH *
WHERE ...
WITH *
WHERE ...
MATCH (somethingElse)
WITH *
WHERE ...

etc.

This allows a WHERE to be inserted anywhere in a query.

So every Where() method will insert WITH * WHERE instead of just WHERE or AND.

Important note: This will slow down the query. Every WITH is a projection and projections aren't free. In my testing a query using WITH * WHERE is 30% - 50% slower than the equivalent WHERE ... AND query.

This is definitely an ugly solution. Hopefully a better solution exists, one that is just as fast as a WHERE ... AND query and doesn't clutter the query text, but this solution works as long as speed and query readability are acceptable sacrifices.

EDIT: I've discovered that this solution also ignores indexes in the WHERE statements! This makes the performance of these queries unacceptably slow for large data sets. I'd strongly advise against using this trick and instead sacrifice some code readability or code duplication for decent performance.

like image 73
Stephen Belden Avatar answered Oct 26 '25 20:10

Stephen Belden


Your code can have 2 collections: say matches and wheres. Your MatchX() and WhereY() methods can append appropriate data to their respective collections. When you are ready to submit your query, you can generate the MATCH and WHERE clauses from the data in those collections.

like image 23
cybersam Avatar answered Oct 26 '25 19:10

cybersam