Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do labels order effects search time?

Tags:

neo4j

cypher

I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query

Match (p:A:B) return count(p) as number

and

Match (p:B:A) return count(p) as number

works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B. So do labels order effects search time? Is this future is documented anywhere?

like image 878
Evgen Avatar asked Feb 10 '15 14:02

Evgen


People also ask

What does the order of ingredients in the ingredient list tell you?

A. Food manufacturers are required to list all ingredients in the food on the label. On a product label, the ingredients are listed in order of predominance, with the ingredients used in the greatest amount first, followed in descending order by those in smaller amounts.

How accurate are labels?

It depends on the food matrix and the nutrient, but in general NIST's measurements are accurate to within 2% to 5% for nutrient elements (such as sodium, calcium and potassium), macronutrients (fats, proteins and carbohydrates), amino acids and fatty acids.

Do you think people should read the labels on products before they buy them why or why not?

Food labels and nutrition labels are useful in helping you make informed, healthy choices when buying food products. Reading food labels can provide you with information on healthier food products as well as help you decipher the nutrient claims that may be present on food packaging.


1 Answers

Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.

When doing a query like

MATCH (n:A:B) return count(n)

labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.

You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.

Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.

As an example I've used the following statements to create some test data:

create (:A:B);
with 1 as a foreach (x in range(0,1000000)  | create (:A));
with 1 as a foreach (x in range(0,100)  | create (:B));

We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:

MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)

result in the exact same query plan (and therefore in the same execution speed):

+------------------+---------------+------+--------+-------------+---------------+
|         Operator | EstimatedRows | Rows | DbHits | Identifiers |         Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation |             3 |    1 |      0 |    count(n) |                       |
|           Filter |            12 |    1 |     12 |           n | hasLabel(n:A) |
|  NodeByLabelScan |            12 |   12 |     13 |           n |            :B |
+------------------+---------------+------+--------+-------------+---------------+

Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)

like image 147
Stefan Armbruster Avatar answered Sep 19 '22 03:09

Stefan Armbruster