ElasticSearch phrase prefix search - How do I get the matched phrase?

Tags:

I'm building an autocomplete feature using ElasticSearch. As the user types, I want to show a list of completions from the data, so the user can select one. For example, if the data contains the following phrases:

very unusual
very unlikely
very useful

and the user types:

very u

I want to display the phrases above.

I'm using this query:

  "query": {
    "multi_match": {
      "query": "very u",
      "fields": [
        "name",
        "description",
        "contentBlocks.caption",
        "contentBlocks.text"
      ],
      "type": "phrase_prefix",
      "max_expansions": 10,
      "cutoff_frequency": 0.001
    }

This matches the content I'm looking for, but extracting the matched phrases from the search results is quite awkward. I have been using highlighting, and I collect the matched phrases by parsing the highlights. For example:

    "highlight": {
      "contentBlocks.text": [
        "turned the <em>very</em> <em>unusual</em> doorknob"
      ]
    }

    "highlight": {
      "contentBlocks.text": [
        "invented a <em>very</em> <em>useful</em> mechanism"
      ]
    }

What's the right way to do this?

"Phrase Suggester" might be capable of doing what I have described, but it is not at all obvious how you would get it to do that.

I have indexed the fields of interest (for example, "description") as follows:

  "description" : {
    "index_analyzer" : "snowball_stem",
    "search_analyzer" : "snowball_stem",
    "type" : "string",
    "fields" : {
      "autocomplete" : {
        "index_analyzer" : "shingle_analyzer",
        "search_analyzer" : "shingle_analyzer",
        "type" : "string"
      }
    }
  },

I am using the snowball_stem analyzer for search, and the shingle_analyzer for the autocomplete function. shingle_analyzer looks like this:

"settings" : {
    "analysis" : {
        "analyzer" : {
            "shingle_analyzer" : {
                "type" : "custom",
                "tokenizer" : "standard",
                "filter" : [
                    "standard",
                    "lowercase",
                    "shingle_filter"
                ],
                "char_filter" : [
                    "html_strip"
                ]
            }
        },
        "filter" : {
            "shingle_filter" : {
                "type" : "shingle",
                "min_shingle_size" : 2,
                "max_shingle_size" : 2
            }
        }
    }
},

The documentation for the phrase suggester seems to be totally oriented toward "spelling correction" rather than completion. Since what I'm after is completion, I set the direct generator's min_word_length and prefix_length to the length of the input text, in this case, 2.

I crafted up a suggestion query based on the documentation:

{
    "text" : "sa",
    "autocomplete_description" : {
        "phrase" : {
            "analyzer" : "standard",
            "field" : "description.autocomplete",
            "size" : 10,
            "max_errors" : 2,
            "confidence" : 0.0,
            "gram_size" : 2,
            "direct_generator" : [
                {
                    "field" : "description.autocomplete",
                    "suggest_mode" : "always",
                    "size" : 10,
                    "min_word_length" : 2,
                    "prefix_length" : 2
                }
            ]
        }
    }
}

This search for suggestions for "sa" comes up with the following results:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "autocomplete_description" : [ {
    "text" : "sa",
    "offset" : 0,
    "length" : 2,
    "options" : [ {
      "text" : "say",
      "score" : 0.012580795
    }, {
      "text" : "sa",
      "score" : 0.01127677
    }, {
      "text" : "san",
      "score" : 0.0106529845
    }, {
      "text" : "sad",
      "score" : 0.008533429
    }, {
      "text" : "saw",
      "score" : 0.008107899
    }, {
      "text" : "sam",
      "score" : 0.007155634
    } ]
  } ]
}

What I expect to find for the input "sa" is words that begin with "sa" of any length. Why does it only return words of two or three characters? Why does it only return six options? The multi_match phrase_prefix query I've been using finds many longer words beginning with "sa", such as "saving", "sassy", "safari", and "salad".

When I search for suggestions for multi-word text, such as "one or" (which occurs plenty of times in the data), it finds nothing. The multi_match phrase_prefix query finds "one or more", "one or the", "one, or you", and "one or both".

How can I get this suggester to do what I want?

723

asked Apr 23 '14 22:04

David Haimson

1 Answers

You can get roughly what you want with the completion suggester. The main problem with this is that it's no longer search aware. You can sorta fix this by adding in a suggester context but it only works for filters and doesn't take into account the search text.

The only way that I know of to get the "best" behavior (context aware search completions) is to do the following:

Create a suggestions field where the text is tokenized as you would want it to be seen by the user (probably standard analyzer or maybe add on a 2-shingle token filter).
Let's say the user issues the incomplete query very un. Behind the scenes issue search for very and then use term aggregations to get a list terms that match the search context, but limit the terms returned with "include": "un.*".
The resulting list will look like [unusual,unlikely,uncool].

The only problem with this method, especially in a sharded environment is that it's a lot of queries and you're pulling a very high cardinality field (suggestions) into memory. So... I don't know if this is practically feasible. So maybe it's better to go back with the completion suggester. If you try either of these I'm interested in hearing your experience with it.

193

answered Oct 21 '22 14:10

JnBrymn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch phrase prefix search - How do I get the matched phrase?

Tags:

autocomplete

elasticsearch

David Haimson

People also ask

1 Answers

JnBrymn

Recent Activity

Donate For Us

ElasticSearch phrase prefix search - How do I get the matched phrase?

Tags:

autocomplete

elasticsearch

David Haimson

People also ask

1 Answers

JnBrymn

Related questions

Recent Activity

Donate For Us