Why do I get a "Cartesian Product" warning?

Tags:

neo4j

I am still trying to understand why I get a cartesian product warning for a certain format for a query in neo4j and not for another. This is how I set up my database:

CREATE (q:Form {version: "1.0"})
CREATE (q:Question {text: "Sector de la empresa", active: true})

I then tried the following query:

MATCH
(f:Form {version: "1.0"}),
(q:Question {text: "Sector de la empresa"})
CREATE (f)-[:asks]->(q)
RETURN f, q

However, I get the following warning:

This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns,
this will build a cartesian product between all those parts.
This may produce a large amount of data and slow down query processing.
While occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (q))

When I use the following query, it does not give me this warning:

MATCH (f:Form {version: "1.0"})
WITH f
(q:Question {text: "Sector de la empresa"})
CREATE (f)-[:asks]->(q)
RETURN f, q

nor when I use this query:

MATCH (f:Form {version: "1.0"})
MATCH (q:Question {text: "Sector de la empresa"})
CREATE (f)-[:asks]->(q)
RETURN f, q

I used this following article as a resource, but it still didn't fully answer my question: Why does neo4j warn: "This query builds a cartesian product between disconnected patterns"?

Why do I get a cartesian product for some formats of a query and not others? Also, I do not fully understand what a cartesian product warning is.

428

asked Jun 10 '16 14:06

2 Answers

If you are MATCHing on two different labels without any relationships between them, then you'll get this warning. The reason is because if you do:

MATCH (a:Foo), (b:Bar)

It's Neo4j's job to find every possible combination of those two nodes. So for the first match of a it will return a row for every match of b, for the second match of a it will again return a row for every match of b, and so on. So you'll get (number of Foo nodes) x (number of Bar nodes) total rows in your result. As your database grows this is really bad for performance.

I can see that you're filtering on version for Form and text for Question, so that would help. That may even give you just one Form node and one Question node. So as long as you have an index on the Form(version) and Question(text) the query should be quite quick. Neo4j can't tell (or at least, isn't currently implemented to be able to tell) how many rows are going to be returned, so it gives a warning saying that your query could be potentially slow.

131

answered Sep 28 '22 05:09

Brian Underwood

They are all cartesian

Having read your question, for a second there, my cypher-world imploded - all three queries should involve a cartesian product.

Having checked (on both the console and a local DB - both version 3.3.0), turns out I'm sane - they do all involve a cartesian product:

The execution plan of the first query, showing a cartesian product

The execution plan of the second query, showing a cartesian product

The execution plan of the third query, showing a cartesian product

Why there is only a warning in the first case (still in version 3.3.0 I have no clue - you simply need to run the planner to figure this out, and if this isn't firing the warning what does? Some dumb cypher logic?

Cypher basics

Cypher queries are made of parts, each can be either update (write) or read.

As far as read parts are concerned, this is what happens:

Neo4j picks a starting point which it reckons will yield the least 'hits'. It will go through each of these hits...
...traversing the graph using the node/relationship pattern.
It does so repeatedly until no more patterns are matched.

If you have something like this:

(a {name:'Bill'})-->(b:Dog)

The plan might look something like this.

For each node (AKA AllNodeScan):
- Filter based on the predicate (name == 'bill')
- Get all outgoing --> relationships
- For each relationship:
  - Get the end node
  - Filter based on predicate (:Dog)

The important thing is that whilst to find (a) we need to scan each node. But we simply traverse the graph to find the (b)s - no AllNodeScan for the latter.

(There are variants of AllNodeScan, see Starting Node Operators)

When your query is something like this:

MATCH (f:Form {version: "1.0"}), (q:Question {text: "Sector de la empresa"})

Neo is forced to do an AllNodeScan for both f and q - There is no pattern to traverse between them. This can potentially create a result set of an f * q size, which could be huge.

answered Sep 28 '22 04:09

Izhaki

Related questions
                            
                                Q: How to include a WHERE clause in a MERGE statement in Neo4j
                            
                                Neo4J unique constraint for multiple labels
                            
                                Time Based Graph Data Modeling
                            
                                How to merge nodes that have the same value for name property in Neo4j
                            
                                Does it make sense to use neo4j to index a file system
                            
                                Neo4j in the cloud [closed]
                            
                                Generic traversal of a directed tree with Neo4J
                            
                                Which jars exactly are needed to embed Neo4j?
                            
                                Issue wiring together spring-data-neo4j standalone spring project
                            
                                What is the most efficient way to insert nodes into a neo4j database using cypher
                            
                                Understanding Neo4j Cypher Profile keyword and execution plan
                            
                                Spring @Autowired Not Working Error Creating Bean Injection Of Autowire dependencies failed
                            
                                Neo4j graph visualizing libraries
                            
                                Is it possible to constraint edge multiplicity in Neo4j / OrientDB?
                            
                                Is it possible to dynamically construct a neo4j cypher query using the GraphRepository pattern
                            
                                How to code the hierarchical relationship to the node of the same type properly in spring data neo4j?
                            
                                In Neo4j, what level of specificity should be used when granularity level can be unlimited?
                            
                                How do you use parameters with neo4j?
                            
                                How to retrieve nodes for multiple depth relationships Neo4j Database Cypher?
                            
                                Recursive query with sub-graph aggregation (arbitrary depth)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With