Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch Multiple Prefix Keywords

I need to use the prefix filter, but allow multiple different prefixes, i.e.

{"prefix": {"myColumn": ["This", "orThis", "orEvenThis"]}}

This does not work. And if I add each as a separate prefix is also obviously doesn't work.

Help is appreciated.

Update

I tried should but without any luck:

$this->dsl['body']['query']['bool']['should'] = [
    ["prefix" => ["myColumn" =>  "This"]],
    ["prefix" => ["myColumn" =>  "orThis"]]
];

When I add those two constraints, I get ALL responses (as though filter is not working). But if I use must with either of those clauses, then I do get a response back with the correct prefix.

like image 841
Kousha Avatar asked Jul 05 '16 19:07

Kousha


People also ask

What is prefix query in Elasticsearch?

If enabled, Elasticsearch indexes prefixes between 2 and 5 characters in a separate field. This lets Elasticsearch run prefix queries more efficiently at the cost of a larger index. Prefix queries will not be executed if search.allow_expensive_queries is set to false.

Does Elasticsearch analyze keyword data types?

Elasticsearch won’t analyze Keyword data types, which means the String that you index will stay as it is. So, with the example above, what would the string looks like in the Inverted Index? Yes, you’re right, it’s exactly as you write.

How does Elasticsearch analyze a full-text field?

If you query a full-text (analyzed) field, Elasticsearch first pass the query string through the defined analyzer to produce the list of terms to be queried.

How does the match phrase prefix query work?

The Match Phrase Prefix Query is a full-text query. If you query a full-text (analyzed) field, Elasticsearch first pass the query string through the defined analyzer to produce the list of terms to be queried.


1 Answers

Based on your comments, it sounds like it may just be an issue with the syntax. With all ES queries (just like SQL ones), I suggest starting simple and just submitting them to ES as the raw DSL outside of code (although in your case this wasn't easily doable). For the request, it's a pretty straight forward one:

{
  "query" : {
    "bool" : {
      "must" : [ ... ],
      "filter" : [
        {
          "bool" : {
            "should" : [
              {
                "prefix" : {
                  "myColumn" : "This"
                }
              },
              {
                "prefix" : {
                  "myColumn" : "orThis"
                }
              },
              {
                "prefix" : {
                  "myColumn" : "orEvenThis"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

I added it as a filter because the optional nature of your prefixing is not improving relevancy: it's literally asking that one of them must match. In such cases where the question is "does this match? yes / no", then you should use a filter (with the added bonus that that's cacheable!). If you're asking "does this match, and which matches better?" then you want a query (because that's relevancy / scoring).

Note: The initial issue appeared to be that the bool / must was unmentioned and the suggestion was to just use a bool / should.

{
  "bool" : {
    "should" : [
      {
        "prefix" : {
          "myColumn" : "This"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orThis"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orEvenThis"
        }
      }
    ]
  }
}

behaves differently than

{
  "bool" : {
    "must" : [ ... ],
    "should" : [
      {
        "prefix" : {
          "myColumn" : "This"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orThis"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orEvenThis"
        }
      }
    ]
  }
}

because the must impacts the required nature of should. Without must, should behaves like a boolean OR. However, with must, it behaves as a completely optional function to improve relevancy (score). To make it go back to the boolean OR behavior with must, you must add minimum_should_match to the bool compound query.

{
  "bool" : {
    "must" : [ ... ],
    "should" : [
      {
        "prefix" : {
          "myColumn" : "This"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orThis"
        }
      },
      {
        "prefix" : {
          "myColumn" : "orEvenThis"
        }
      }
    ],
    "minimum_should_match" : 1
  }
}

Notice that it's a component of the bool query, and not of either should or must!

like image 159
pickypg Avatar answered Sep 23 '22 00:09

pickypg