I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B. So do labels order effects search time? Is this future is documented anywhere?
A. Food manufacturers are required to list all ingredients in the food on the label. On a product label, the ingredients are listed in order of predominance, with the ingredients used in the greatest amount first, followed in descending order by those in smaller amounts.
It depends on the food matrix and the nutrient, but in general NIST's measurements are accurate to within 2% to 5% for nutrient elements (such as sodium, calcium and potassium), macronutrients (fats, proteins and carbohydrates), amino acids and fatty acids.
Food labels and nutrition labels are useful in helping you make informed, healthy choices when buying food products. Reading food labels can provide you with information on healthier food products as well as help you decipher the nutrient claims that may be present on food packaging.
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A
.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A
nodes and then they're filtered if those nodes carry label B
as well. If n(A) >> n(B)
it's way more efficient to do MATCH (n:B:A)
instead since you look up only a few B
nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n)
to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B
nodes, it's cheaper to scan for B's and filter for A
. Smart Cypher, isn't it ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With