I observere an enormous runtime difference between those two AQL statements an a DB set with about 20 Mio records:
FOR e IN EAll
FILTER e.lastname == "Kmp" // <-- skip-index
FILTER e.lastpaff != "" // <-- no index
RETURN e
// runs in less than a second
AND
FOR e IN EAll
FILTER e.lastpaff != "" // <-- no index
FILTER e.lastname == "Kmp" // <-- skip-index
RETURN e
// needs about a minute to execute.
In addition to be (or not) indexed, the selectivity of those statements is highly different: the indexedAttribute is highly selective where-as the nonIndexedAttribute only filters 50%.
Is it possible that there is not yet an optimization rule for that? I currently am using ArangoDB 2.4.0.
DETAILS:
There is a SKIP-Index on the indexed Attribute (which seems to be used in the execuation plan 1). Here are the execuation plan, in which only the order of the filters are changed:
FAST QUERY:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "IndexRangeNode",
"dependencies" : [
1
],
"id" : 8,
"estimatedCost" : 170463.32,
"estimatedNrItems" : 170462,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"ranges" : [
[
{
"variable" : "i",
"attr" : "lastname",
"lowConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"highConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"lows" : [ ],
"highs" : [ ],
"valid" : true,
"equality" : true
}
]
],
"index" : {
"type" : "skiplist",
"id" : "13295598550318",
"unique" : false,
"fields" : [
"lastname"
]
},
"reverse" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
8
],
"id" : 5,
"estimatedCost" : 340925.32,
"estimatedNrItems" : 170462,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 511387.32,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2",
"use-index-range",
"remove-filter-covered-by-index"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
SLOW Query:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "EnumerateCollectionNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 17046233,
"estimatedNrItems" : 17046232,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"random" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 34092465,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 1,
"name" : "1"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 51138697,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 1,
"name" : "1"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 68184929,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastname",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : "Kmp"
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 85231161,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
Indeed, conditions like the following disabled the usage of indexes even though an index could be used:
FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...
Interestingly an index is used when the two conditions are put into the same FILTER condition and combined with &&:
FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...
Though these two statements are equivalent, they trigger a slightly different code path. The former will be AND-combining two existing FILTER ranges, the latter one will produce a range from a single FILTER. The case of AND-combination for the FILTER ranges was overly defensive and rejected both sides even if only a single side (in this case the one with the non-equality operator) could not be used for an index scan.
This has been fixed in 2.4, and the fix will be contained in 2.4.2. A workaround for now is to combine the two FILTER statements in a single one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With