Do labels order effects search time?

Tags:

cypher

I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query

Match (p:A:B) return count(p) as number

and

Match (p:B:A) return count(p) as number

works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B. So do labels order effects search time? Is this future is documented anywhere?

878

asked Feb 10 '15 14:02

1 Answers

Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.

When doing a query like

MATCH (n:A:B) return count(n)

labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.

You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.

Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.

As an example I've used the following statements to create some test data:

create (:A:B);
with 1 as a foreach (x in range(0,1000000)  | create (:A));
with 1 as a foreach (x in range(0,100)  | create (:B));

We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:

MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)

result in the exact same query plan (and therefore in the same execution speed):

+------------------+---------------+------+--------+-------------+---------------+
|         Operator | EstimatedRows | Rows | DbHits | Identifiers |         Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation |             3 |    1 |      0 |    count(n) |                       |
|           Filter |            12 |    1 |     12 |           n | hasLabel(n:A) |
|  NodeByLabelScan |            12 |   12 |     13 |           n |            :B |
+------------------+---------------+------+--------+-------------+---------------+

Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)

147

answered Sep 19 '22 03:09

Stefan Armbruster

Related questions
                            
                                Neo4j: Conditions on Relationships with Depth
                            
                                When to return Iterable<String> rather than List,Set,Collection?
                            
                                Neo4j Server vs Embedded mode
                            
                                Using multiple labels with Neomodel
                            
                                Unable to establish a neo4j - bolt driver connection in javascript
                            
                                What is the right way to find an edge between two vertices?
                            
                                How to check array property in neo4j?
                            
                                Do having multiple labels for a node in Neo4j make any sense?
                            
                                Does Cypher's ORDER BY uses the index?
                            
                                How does count(nodes(p)) work in Cypher, Neo4j
                            
                                "NOT" operator for MATCH in Neo4j
                            
                                How to search all nodes within particular radius using latitude and longitude in neo4j
                            
                                Neo4j property string limit
                            
                                Neo4j match path exclude node with certain label
                            
                                Neo4j "Can't wait on resource" lock error
                            
                                Using variable to create relationship in cypher?
                            
                                Using Java 7 with neo4j on OS X
                            
                                Iterating through a collection with MATCH and CREATE clauses
                            
                                Neo4j: how do I delete all duplicate relationships in the database through cypher?
                            
                                How to optimize Neo4j Cypher queries with multiple node matches (Cartesian Product)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do labels order effects search time?

Tags:

neo4j

cypher

Evgen

People also ask

1 Answers

Stefan Armbruster

Recent Activity

Donate For Us