Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search query using match_phrase_prefix and fuzziness at the same time?

I am new to elastic search, so I am struggling a bit to find the optimal query for our data.

Imagine I want to match the following word "Handelsstandens Boldklub".

Currently, I'm using the following query:

{
    query: {
      bool: {
        should: [
          {
            match: {
              name: {
                query: query, slop: 5, type: "phrase_prefix"
              }
            }
          },
          {
            match: {
              name: {
                query: query,
                fuzziness: "AUTO",
                operator: "and"
              }
            }
          }
        ]
      }
    }
  }

It currently list the word if I am searching for "Hand", but if I search for "Handle" the word will no longer be listed as I did a typo. However if I reach to the end with "Handlesstandens" it will be listed again, as the fuzziness will catch the typo, but only when I have typed the whole word.

Is it somehow possible to do phrase_prefix and fuzziness at the same time? So in the above case, if I make a typo on the way, it will still list the word?

So in this case, if I search for "Handle", it will still match the word "Handelsstandens Boldklub".

Or what other workarounds are there to achieve the above experience? I like the phrase_prefix matching as its also supports sloppy matching (hence I can search for "Boldklub han" and it will list the result)

Or can the above be achieved by using the completion suggester?

like image 842
Henrik Holm Avatar asked Aug 24 '16 09:08

Henrik Holm


People also ask

What is match phrase prefix query in Elasticsearch?

The Match Phrase Prefix Query is a full-text query. If you query a full-text (analyzed) field, Elasticsearch first pass the query string through the defined analyzer to produce the list of terms to be queried.

What is fuzzy matching in Elasticsearch?

A fuzzy search refers to find matches to a pattern that match approximately according to some criteria. Elastic search supports fuzzy matching using an algorithm called Levenshtein edit distance.

What is a match phrase query?

« Match phrase query Combined fields » Match phrase prefix query edit Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.

What is a prefix or fuzzy query?

Queries like the term prefix or fuzzy queries are low-level queries that have no analysis phase. They operate on a single term. It is important to remember that the term query looks in the inverted index for the exact term only; it won’t match any variants like elvis or Elvis. The Match Phrase Prefix Query is a full-text query.


1 Answers

Okay, so after investigating elasticsearch even further, I came to the conclusion that I should use ngrams.

Here is a really good explaniation of what it does and how it works. https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

Here is the settings and mapping I used: (This is elasticsearch-rails syntax)

settings analysis: {
  filter: {
    ngram_filter: {
      type: "ngram",
      min_gram: "2",
      max_gram: "20"
    }
  },
  analyzer: {
    ngram_analyzer: {
      type: "custom",
      tokenizer: "standard",
      filter: ["lowercase", "ngram_filter"]
    }
  }
} do
  mappings do
    indexes :name, type: "string", analyzer: "ngram_analyzer"
    indexes :country_id, type: "integer"
  end
end

And the query: (This query actually search in two different indexes at the same time)

{
    query: {
      bool: {
        should: [
          {
            bool: {
              must: [
                { match: { "club.country_id": country.id } },
                { match: { name: query } }
              ]
            }
          },
          {
            bool: {
              must: [
                { match: { country_id: country.id } },
                { match: { name: query } }
              ]
            }
          }
        ],
        minimum_should_match: 1
      }
    }
  }

But basically you should just do a match or multi match query, depending on how many fields you want to search in.

I hope someone find it helpful, as I was personally thinking to much in terms of fuzziness instead of ngrams (Didn't know about before). This led me in the wrong direction.

like image 194
Henrik Holm Avatar answered Oct 17 '22 19:10

Henrik Holm