Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query Rule for non-indexed attribute FILTER

Tags:

arangodb

I observere an enormous runtime difference between those two AQL statements an a DB set with about 20 Mio records:

FOR e IN EAll
    FILTER e.lastname == "Kmp"  // <-- skip-index
    FILTER e.lastpaff != ""     // <-- no index
RETURN e
// runs in less than a second

AND

FOR e IN EAll
    FILTER e.lastpaff != ""     // <-- no index
    FILTER e.lastname == "Kmp"  // <-- skip-index
RETURN e
// needs about a minute to execute.

In addition to be (or not) indexed, the selectivity of those statements is highly different: the indexedAttribute is highly selective where-as the nonIndexedAttribute only filters 50%.

Is it possible that there is not yet an optimization rule for that? I currently am using ArangoDB 2.4.0.

DETAILS:

There is a SKIP-Index on the indexed Attribute (which seems to be used in the execuation plan 1). Here are the execuation plan, in which only the order of the filters are changed:

    FAST QUERY:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "IndexRangeNode",
            "dependencies" : [
              1
            ],
            "id" : 8,
            "estimatedCost" : 170463.32,
            "estimatedNrItems" : 170462,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "ranges" : [
              [
                {
                  "variable" : "i",
                  "attr" : "lastname",
                  "lowConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "highConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "lows" : [ ],
                  "highs" : [ ],
                  "valid" : true,
                  "equality" : true
                }
              ]
            ],
            "index" : {
              "type" : "skiplist",
              "id" : "13295598550318",
              "unique" : false,
              "fields" : [
                "lastname"
              ]
            },
            "reverse" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              8
            ],
            "id" : 5,
            "estimatedCost" : 340925.32,
            "estimatedNrItems" : 170462,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 511387.32,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 681849.3200000001,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2",
          "use-index-range",
          "remove-filter-covered-by-index"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 681849.3200000001,
        "estimatedNrItems" : 170462
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }



    SLOW Query:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "EnumerateCollectionNode",
            "dependencies" : [
              1
            ],
            "id" : 2,
            "estimatedCost" : 17046233,
            "estimatedNrItems" : 17046232,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "random" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              2
            ],
            "id" : 3,
            "estimatedCost" : 34092465,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 1,
              "name" : "1"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              3
            ],
            "id" : 4,
            "estimatedCost" : 51138697,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 1,
              "name" : "1"
            }
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              4
            ],
            "id" : 5,
            "estimatedCost" : 68184929,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare ==",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastname",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : "Kmp"
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 85231161,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 102277393,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 102277393,
        "estimatedNrItems" : 17046232
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }
like image 325
augustin-s Avatar asked Mar 12 '26 21:03

augustin-s


1 Answers

Indeed, conditions like the following disabled the usage of indexes even though an index could be used:

FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...

Interestingly an index is used when the two conditions are put into the same FILTER condition and combined with &&:

FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...

Though these two statements are equivalent, they trigger a slightly different code path. The former will be AND-combining two existing FILTER ranges, the latter one will produce a range from a single FILTER. The case of AND-combination for the FILTER ranges was overly defensive and rejected both sides even if only a single side (in this case the one with the non-equality operator) could not be used for an index scan.

This has been fixed in 2.4, and the fix will be contained in 2.4.2. A workaround for now is to combine the two FILTER statements in a single one.

like image 173
stj Avatar answered Mar 17 '26 04:03

stj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!